Submit your inspiring success story or innovative use case to the 2022 Dataiku Frontrunner Awards! ENTER YOUR SUBMISSION

Multiple Json files in a managed folder to Csv files using python and sync to S3

Solved!
suhail-bari
Level 1
Multiple Json files in a managed folder to Csv files using python and sync to S3

I have some JSON files on a managed folder. Each file has a different schema. I'm trying to convert it into CSV files and upload them onto an S3 bucket.

Now, the way I thought this out was once the files were extracted to the managed folder. I'd run a python script to extract the results from each file, convert it into a dataset and have the dataset synced to the S3 bucket. 

Another way is to create a python probe on the managed folder. Whenever the files come in, they can be converted into CSV and stored in the folder.

I'm new to DataIku, what's the best way to sync the managed folder onto the bucket along with the conversion from JSON to CSV?

0 Kudos
1 Solution
suhail-bari
Level 1
Author

Answering for self, keep multiple files in folders. Avoid blowing up the screen with hundreds of datasets, never goes well.

Create a Python probe, convert object to df to dataset.

Look up managed folders on dataiku documentation to write it onto folders.

View solution in original post

2 Replies
suhail-bari
Level 1
Author

Answering for self, keep multiple files in folders. Avoid blowing up the screen with hundreds of datasets, never goes well.

Create a Python probe, convert object to df to dataset.

Look up managed folders on dataiku documentation to write it onto folders.

CoreyS
Community Manager
Community Manager

Thank you for sharing your solution @suhail-bari

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
0 Kudos