Extra columns created from nowhere

Jacques
Jacques Dataiku DSS Core Designer, Registered Posts: 6

Hi all,

When a run a flow certains of my dataset have extra columns created from i don't know where. Therefore it created warnings.

Do you already experienced any kind of this issue ? And thank you all for your help

Best regard,

Jacques


Operating system used: Windows

Answers

  • Grixis
    Grixis PartnerApplicant, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 77 ✭✭✭✭✭

    Hello @Jacques

    I suppose this is a miss of interpreting the schema of your input csv dataset.

    Can you look in the settings of CIB_ITO_GB_CIO_GB_s0.csv.gz by double click on your data for explore and go to > settings then format.

    Try a 'redetect' and look at what its shown, the warning message must be explicit. Update the preview and check again for schema and data are consistent or not.

    I suppose that a parameter like the delimiter is inconsistent due to the misinterpretation of your file format as xlsx or the type isnt the good one selected in the suggestion list. Test the configs once you find the expected schema in preview you will have to propagate the schema from your new dataset to the dependent datasets by right-clicking and Propagate schema across Flow from here

    While waiting for more information because I cant see so much in your screenshot, maybe the step 3 "Import and explore data" of these modules https://academy.dataiku.com/excel-to-dataiku-dss-quick-start/858783

  • Jacques
    Jacques Dataiku DSS Core Designer, Registered Posts: 6

    Hello Grixis,

    First of all, thank you for your time and your help.

    In fact, the first image attached i have warnings about the final dataset columns meaning types that do not exist in the current dataset (dataset_Original) that just have 45 columns (see attached flow) What i do not understand is why is calling the columns in the Dataset_final while it supposed to get the result of the recipe output ?

    And when a check again in preview it will detect an error but after update schema the error is gone (schema is consistent) but still have 150 columns wich is the total nimber of the Dataset_final.

    Thank you for your reply.

    BR,

    Jacques

Setup Info
    Tags
      Help me…