when importing several flat files at once, is there a way to ascertain all files where imported?

a_bouffard Registered Posts: 9 ✭✭✭✭

I am working on data that is stored in 6 text files (export_PART1.tab, export_PART2.tab... export_PART6.tab).

To import them all at once, I drag and drop the six files to the "Drag and drop your files here" area in DSS.

DSS then displays the following warning "Creates a single dataset: multiple files must have the same schema."

and indicates "Used export_PART 3.tab to parse data".

To figure out if the six files where indeed all added into the dataset (and not only export_PART 3.tab), right now I:

  • check the "status" tab of the output dataset and generate the "record count" metric
  • import the file export_PART 3.tab alone in another dataset
  • generate its "record count" metric
  • and eventually compare it with the first record count metric...

That is pretty long...

Is there any way, during the import phase, to check which files where correctly imported / which were not imported because of schema inconsistencies?

I've seen the "SCHEMA CONSISTENCY / CHECK ON ALL FILES" button in the Advanced tab of the import page, but it only enables to check Schema consistency.

But I would like is to know which files where imported in the dataset.

Best Answer

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Answer ✓


    The schema detection and preview in the dataset "Settings" tab is only done on a file. However, all files are always "imported" and will always be read by DSS, each time you run a recipe.

    The warning about the same schema is mostly a reminder that this is not the way to create 6 independent datasets.

    DSS will not "ignore" a file because of schema inconsistencies. The precise behavior then depends on the file type. If your files are CSV/TSV, DSS will simply read all of them with the same column headers. Thus, if your files have different columns, you may get mangled data, but all your records will be here.


  • a_bouffard
    a_bouffard Registered Posts: 9 ✭✭✭✭

    Thanks for the explanation.

    Thus the answer to my question is No, which is too bad.

    --> May I suggest improving the DSS UI to make this "under the hood" behaviour obvious?

    Were I not to ask that question here, I could some day have ended with mangled data, without knowing it...

  • PZD
    PZD Registered Posts: 1 ✭✭✭✭


    I read this discussion and I have a similar question. I have currently a dataset with 10 different csv files which represent a monthly update for each file. Now the vendor made a change to data and the schema is different from previous month.

    Is there a way to change the file to parse the data? I tried the "File for test & preview" in the advanced options section in Files tab and doesn't work on my end

Setup Info
      Help me…