Multiple Json files in a managed folder to Csv files using python and sync to S3

Options
suhail-bari
suhail-bari Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 2 ✭✭✭✭

I have some JSON files on a managed folder. Each file has a different schema. I'm trying to convert it into CSV files and upload them onto an S3 bucket.

Now, the way I thought this out was once the files were extracted to the managed folder. I'd run a python script to extract the results from each file, convert it into a dataset and have the dataset synced to the S3 bucket.

Another way is to create a python probe on the managed folder. Whenever the files come in, they can be converted into CSV and stored in the folder.

I'm new to DataIku, what's the best way to sync the managed folder onto the bucket along with the conversion from JSON to CSV?

Tagged:

Best Answer

  • suhail-bari
    suhail-bari Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 2 ✭✭✭✭
    Answer ✓
    Options

    Answering for self, keep multiple files in folders. Avoid blowing up the screen with hundreds of datasets, never goes well.

    Create a Python probe, convert object to df to dataset.

    Look up managed folders on dataiku documentation to write it onto folders.

Answers

Setup Info
    Tags
      Help me…