is there a way to split output file into smaller chunks?

Witw
Witw Registered Posts: 2

Hi, I have dataiku flow to format the data and generate csv files to azure blob storage.

However it seems like the output files is over 100MB each. We would like to have a control over the output file - spliting data into smaller files - not more than 50MB each file.

Wonder if this is possible to do on dataiku flow?

Thanks


Operating system used: window

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,108 Neuron

    Certainly possible, calculate your average row size, then calculate how many rows that amounts to 50 MB and then split the data in that number of rows / files. It would certainly need Python but it's not that complicated to do.

Setup Info
    Tags
      Help me…