Export file in CSV-format on hdfs instead of a managed dataset by DataIku

nv
nv Registered Posts: 11 ✭✭✭✭
I have a recipe which scores customers everyday. I want to export the scores in a csv-file everyday to our hdfs.

However for now each time the recipe runs:

- The output files is splitted in severel smaller files (out-s0-c0, out-s0-c1,out-s0-c2....)

- The output file is in format (out-s0-c0, out-s0-c1,out-s0-c2....) ==> even though I set 'Seperated values (CSV,TSV...) on 'Type'.



Question:
1. How do I get my export in 1 file instead of different files (out-s0-c0, out-s0-c1,out-s0-c2....)
2. How do I get the format in csv-format and not the out-s0-c....
Tagged:

Answers

  • AdrienL
    AdrienL Dataiker, Alpha Tester Posts: 196 Dataiker
    The simplest way to obtain a CSV is to download the dataset, which is then consolidated in a single CSV file.

    However if I understand your question correctly, you want to automatically export a single-CSV-file HDFS dataset. Unfortunately this is not yet natively supported, you'd need to add a Shell recipe to consolidate those files manually afterwards (and add the extension). The `.csv` extensions will be added in the upcoming 2.3 release of DSS.
  • jrouquie
    jrouquie Dataiker Alumni Posts: 87 ✭✭✭✭✭✭✭
    Also, note that the out-s42* files are already in csv format. As compared to your request
    - they are split into several files (due to parallel processing)
    - they lack the ".csv" extension (upgrade to the soon to be released v2.3 to fix this)

    But they should be readable by your downstream application.
Setup Info
    Tags
      Help me…