Column descriptions lost with next recipe

Wuser92 · May 2018

Is there a way to keep the column descriptions along the pipeline? If I add column descriptions at the beginning of the pipeline, it seems like I need to add them again for every output along the pipeline. Is this the case or am I doing something wrong?

Thanks,
Simon

Alex_Reutter · May 2018

Hi,

When you add a column description at the beginning of a pipeline, that's a change to the schema of that dataset. That schema change needs to be propagated to all datasets along the pipeline: https://answers.dataiku.com/1237/is-there-a-way-to-propagate-schema-changes-in-a-whole-flow

Wuser92 · May 2018

Thanks Alex, I've tried propagating it before, but all the schema checks say everything is already propagated. Even dropping and deleting the schema of the output datasets doesn't help. The only way I can get it to work is to replace every recipe along the flow with a new one and manually add all the steps to the new recipes. It seems like somehow the column descriptions don't make it into the existing recipe, as even copying an existing recipe also removes the column descriptions.

Alex_Reutter · May 2018

After I propagate the schema changes, I do a smart reconstruction build of the final dataset in the pipeline, and then I see the descriptions added in the first dataset.

Wuser92 · May 2018

Even a smart or forced rebuild doesn't work in my case. I'm adding the descriptions in the middle of the flow. Do the descriptions need to be at the very beginning of the flow?

Here's some screenshots:
soep_selected input dataset with column descriptions (right after adding them in the visual recipe): https://snag.gy/I5RFHV.jpg
Flow with all schemas propagated: https://snag.gy/w95LRC.jpg
soep_cleaned output dataset missing the descriptions: https://snag.gy/WZxuiH.jpg

I'm using Dataiku Version 4.2.0.

Column descriptions lost with next recipe

Answers

Categories

Setup Info

Tags