Discover all of the brand-new features and improvements to existing capabilities in the Dataiku 11.3 updateLET'S GO

Prepare recipie parse format

Hello everyone,

One functionality very useful for me and a lot of my colleagues will be to be able to do mass parse for the type of columns. Actually its possible for meanings :


But there is no impact on metadata in the output dataset. It's an issue for us because as we have poor completeness from the sources, with a lot of null values, we have often this reinterpretation in a prepare recipie (it seems the explorer only is used to create the schema and not all the dataset) :


Don't hesitate to tell me if it's not clear, or if there is already a solution for this use case,

Thanks a lot,



I'm not clear that I understand this.

I know that recently I discovered the feature that would apply auto schema types on all columns.  (Which saves me a bunch of time.)


 That said this does have problems with high numbers of missing values.  Sometimes not getting enough data in the first 10,000 row data sample to find the correct type.  Sometimes the data is only on the most recent rows of data. (not coming in the first 10000 rows from the data source, maybe existing in only the last 10000 rows of data.)

Thank you for your answer, to illustrate my use case, I have reproduced the issue with a mini flow :


For file data_source, no problem with "infer type schema" or functionality set type to force data type :


The source is ok with the format (all amounts = double), but when I create prepare recipie, some amounts are automatically reinterpreted as string :



I have tried defaulting but it doesn't work. In this step, the only way is manual change column by column and it's not simple with 50,100 or 300 amounts to manage...

This is a particular issue, but quite common for financial/accounting use case because null and 0 value can have distinct meaning. That's why the set type function (as in file source dataset) in a prepare recipie will be useful to manage mass action on data type.

Maybe, if I change the type of explorer, it will refresh the schema (but for me its not 100 % reliable) ?




As a Neuron, I suspect that you are on a recent version of DSS.

I have definitely seen some of the kinds of changes you are calling out in visual recipes.  Maybe not as bad as it was back when I started with DSS.  (Or maybe I understand how to deal with Schema problems a little bit better.)


We're using V8.0.2 (but I have already seen this issue with V5). At Generali, there are hive tables as sources so data formats (theoretically) are well managed before ingestion in dss.

I did some tests with local machine with postgree table and csv file to be sure I can reproduce this case. It make sense that the more i have completeness, the more schemas will stay stable accross dss flow. I guess null/blank values have to be treated by users before run dss flows.

I'm aware I can fix it quickly with cast in an sql recipie, but I'm always looking for a graphical way to deal with this subject, for our clickers/beginner end-users 🙂

(I have posted a similar issue there )

Status changed to: Needs Info


Thanks for the suggestion @Tuong-Vi; we've heard similar requests. If I've understood the thread, you're looking for a visual-friendly way to prevent the Prepare from inferring data type where you want it to always 'inherit' those types from the input dataset?



Hi @Tuong-Vi , let me know if I've correctly understood your request. Thanks!

Level 1

@AshleyW It would be great if dataiku would respect our type and meaning choices through an entire pipeline. We've been quite frustrated at DataIku's insistence on changing text fields to booleans, decimal fields to scientific notation, and the need to reparse dates every time they go through a prepare recipe.

Level 1


I agree. I don't understand why Dataiku is changing format from Double to String after a Prepare recipe. I checked and every row is fulfilled with an integer. 

Prepare Recipe changing format.PNG

So why Dataiku is changing this format? How can we avoid this behaviour?

Best regards,

Community Manager
Community Manager
Status changed to: Gathering Input