Multiple Json files in a managed folder to Csv files using python and sync to S3

suhail-bari · February 2022

I have some JSON files on a managed folder. Each file has a different schema. I'm trying to convert it into CSV files and upload them onto an S3 bucket.

Now, the way I thought this out was once the files were extracted to the managed folder. I'd run a python script to extract the results from each file, convert it into a dataset and have the dataset synced to the S3 bucket.

Another way is to create a python probe on the managed folder. Whenever the files come in, they can be converted into CSV and stored in the folder.

I'm new to DataIku, what's the best way to sync the managed folder onto the bucket along with the conversion from JSON to CSV?

suhail-bari · April 2022

Answering for self, keep multiple files in folders. Avoid blowing up the screen with hundreds of datasets, never goes well.

Create a Python probe, convert object to df to dataset.

Look up managed folders on dataiku documentation to write it onto folders.

CoreyS · April 2022

Thank you for sharing your solution @suhail-bari
!

Multiple Json files in a managed folder to Csv files using python and sync to S3

Best Answer

Answers

Categories

Setup Info

Tags