Meet DSS user Ben Powis, Data Science Manager at UK retail company MandM Direct Read More

Column descriptions lost with next recipe

Level 2
Column descriptions lost with next recipe
Is there a way to keep the column descriptions along the pipeline? If I add column descriptions at the beginning of the pipeline, it seems like I need to add them again for every output along the pipeline. Is this the case or am I doing something wrong?

Thanks,
Simon
0 Kudos
4 Replies
Dataiker
Dataiker
Hi,

When you add a column description at the beginning of a pipeline, that's a change to the schema of that dataset. That schema change needs to be propagated to all datasets along the pipeline: https://answers.dataiku.com/1237/is-there-a-way-to-propagate-schema-changes-in-a-whole-flow
0 Kudos
Level 2
Author
Thanks Alex, I've tried propagating it before, but all the schema checks say everything is already propagated. Even dropping and deleting the schema of the output datasets doesn't help. The only way I can get it to work is to replace every recipe along the flow with a new one and manually add all the steps to the new recipes. It seems like somehow the column descriptions don't make it into the existing recipe, as even copying an existing recipe also removes the column descriptions.
0 Kudos
Dataiker
Dataiker
After I propagate the schema changes, I do a smart reconstruction build of the final dataset in the pipeline, and then I see the descriptions added in the first dataset.
0 Kudos
Level 2
Author
Even a smart or forced rebuild doesn't work in my case. I'm adding the descriptions in the middle of the flow. Do the descriptions need to be at the very beginning of the flow?

Here's some screenshots:
soep_selected input dataset with column descriptions (right after adding them in the visual recipe): https://snag.gy/I5RFHV.jpg
Flow with all schemas propagated: https://snag.gy/w95LRC.jpg
soep_cleaned output dataset missing the descriptions: https://snag.gy/WZxuiH.jpg

I'm using Dataiku Version 4.2.0.
0 Kudos
Labels (2)