Export file in CSV-format on hdfs instead of a managed dataset by DataIku
nv
Registered Posts: 11 ✭✭✭✭
I have a recipe which scores customers everyday. I want to export the scores in a csv-file everyday to our hdfs.
However for now each time the recipe runs:
- The output files is splitted in severel smaller files (out-s0-c0, out-s0-c1,out-s0-c2....)
- The output file is in format (out-s0-c0, out-s0-c1,out-s0-c2....) ==> even though I set 'Seperated values (CSV,TSV...) on 'Type'.
Question:
1. How do I get my export in 1 file instead of different files (out-s0-c0, out-s0-c1,out-s0-c2....)
2. How do I get the format in csv-format and not the out-s0-c....
However for now each time the recipe runs:
- The output files is splitted in severel smaller files (out-s0-c0, out-s0-c1,out-s0-c2....)
- The output file is in format (out-s0-c0, out-s0-c1,out-s0-c2....) ==> even though I set 'Seperated values (CSV,TSV...) on 'Type'.
Question:
1. How do I get my export in 1 file instead of different files (out-s0-c0, out-s0-c1,out-s0-c2....)
2. How do I get the format in csv-format and not the out-s0-c....
Answers
-
The simplest way to obtain a CSV is to download the dataset, which is then consolidated in a single CSV file.
However if I understand your question correctly, you want to automatically export a single-CSV-file HDFS dataset. Unfortunately this is not yet natively supported, you'd need to add a Shell recipe to consolidate those files manually afterwards (and add the extension). The `.csv` extensions will be added in the upcoming 2.3 release of DSS. -
Also, note that the out-s42* files are already in csv format. As compared to your request
- they are split into several files (due to parallel processing)
- they lack the ".csv" extension (upgrade to the soon to be released v2.3 to fix this)
But they should be readable by your downstream application.