Meet DSS user Ben Powis, Data Science Manager at UK retail company MandM Direct Read More

Export file in CSV-format on hdfs instead of a managed dataset by DataIku

Level 2
Export file in CSV-format on hdfs instead of a managed dataset by DataIku
I have a recipe which scores customers everyday. I want to export the scores in a csv-file everyday to our hdfs.

However for now each time the recipe runs:

- The output files is splitted in severel smaller files (out-s0-c0, out-s0-c1,out-s0-c2....)

- The output file is in format (out-s0-c0, out-s0-c1,out-s0-c2....) ==> even though I set 'Seperated values (CSV,TSV...) on 'Type'.



Question:
1. How do I get my export in 1 file instead of different files (out-s0-c0, out-s0-c1,out-s0-c2....)
2. How do I get the format in csv-format and not the out-s0-c....
0 Kudos
2 Replies
Dataiker
Dataiker
The simplest way to obtain a CSV is to download the dataset, which is then consolidated in a single CSV file.

However if I understand your question correctly, you want to automatically export a single-CSV-file HDFS dataset. Unfortunately this is not yet natively supported, you'd need to add a Shell recipe to consolidate those files manually afterwards (and add the extension). The `.csv` extensions will be added in the upcoming 2.3 release of DSS.
0 Kudos
Dataiker Alumni
Also, note that the out-s42* files are already in csv format. As compared to your request
- they are split into several files (due to parallel processing)
- they lack the ".csv" extension (upgrade to the soon to be released v2.3 to fix this)

But they should be readable by your downstream application.
0 Kudos
Labels (2)