Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

Extra columns created from nowhere

Jacques
Level 2
Extra columns created from nowhere

Hi all,

When a run a flow certains of my dataset have extra columns created from i don't know where. Therefore it created warnings.

Do you already experienced any kind of this issue ? And thank you all for your help

 

Best regard,

Jacques


Operating system used: Windows

0 Kudos
2 Replies
Grixis
Level 4

Hello @Jacques 

I suppose this is a miss of interpreting the schema of your input csv dataset.

Can you look in the settings of CIB_ITO_GB_CIO_GB_s0.csv.gz by double click on your data for explore and go to > settings then format.

Try a 'redetect' and look at what its shown, the warning message must be explicit. Update the preview and check again for schema and data are consistent or not. 

I suppose that a parameter like the delimiter is inconsistent due to the misinterpretation of your file format as xlsx or the type isnt the good one selected in the suggestion list. Test the configs once you find the expected schema in preview you will have to propagate the schema from your new dataset to the dependent datasets by right-clicking and Propagate schema across Flow from here

While waiting for more information because I cant see so much in your screenshot, maybe the step 3 "Import and explore data" of these modules https://academy.dataiku.com/excel-to-dataiku-dss-quick-start/858783 

0 Kudos
Jacques
Level 2
Author

Hello Grixis,

First of all, thank you for your time and your help.

In fact, the first image attached i have warnings about the final dataset columns meaning types that do not exist in the current dataset (dataset_Original) that just have 45 columns (see attached flow) What i do not understand is why is calling the columns in the Dataset_final while it supposed to get the result of the recipe output ?

 

And when a check again in preview it will detect an error but after update schema the error is gone (schema is consistent) but still have 150 columns wich is the total nimber of the Dataset_final.

 

Thank you for your reply.

BR,

Jacques

 

0 Kudos