Export file in CSV-format on hdfs instead of a managed dataset by DataIku
 
            
                
                    nv                
                
                    Registered Posts: 11 ✭✭✭✭                
            
                        
            
                    I have a recipe which scores customers everyday. I want to export the scores in a csv-file everyday to our hdfs.
However for now each time the recipe runs:
- The output files is splitted in severel smaller files (out-s0-c0, out-s0-c1,out-s0-c2....)
- The output file is in format (out-s0-c0, out-s0-c1,out-s0-c2....) ==> even though I set 'Seperated values (CSV,TSV...) on 'Type'.
Question:
1. How do I get my export in 1 file instead of different files (out-s0-c0, out-s0-c1,out-s0-c2....)
2. How do I get the format in csv-format and not the out-s0-c....
                        
            However for now each time the recipe runs:
- The output files is splitted in severel smaller files (out-s0-c0, out-s0-c1,out-s0-c2....)
- The output file is in format (out-s0-c0, out-s0-c1,out-s0-c2....) ==> even though I set 'Seperated values (CSV,TSV...) on 'Type'.
Question:
1. How do I get my export in 1 file instead of different files (out-s0-c0, out-s0-c1,out-s0-c2....)
2. How do I get the format in csv-format and not the out-s0-c....
Answers
- 
            The simplest way to obtain a CSV is to download the dataset, which is then consolidated in a single CSV file.
 However if I understand your question correctly, you want to automatically export a single-CSV-file HDFS dataset. Unfortunately this is not yet natively supported, you'd need to add a Shell recipe to consolidate those files manually afterwards (and add the extension). The `.csv` extensions will be added in the upcoming 2.3 release of DSS.
- 
            Also, note that the out-s42* files are already in csv format. As compared to your request
 - they are split into several files (due to parallel processing)
 - they lack the ".csv" extension (upgrade to the soon to be released v2.3 to fix this)
 But they should be readable by your downstream application.

