In my pipeline, shown below, I build the final (right-most) dataset every day with a forced recursive rebuild of dependant data sets.
I have a recurring issue where sometimes the data pulled by the python jobs has a slightly different schema, and my pipeline breaks. I fix this by manually running the python step in question, then DSS will prompt me to update the schema based on this new data. When I do this I can re-run the scenario without error.
My question is, can I enable my forced recursive rebuild to also update the schema (if required), so that this does not cause my scenario to fail?
When automating a flow, the assumption is that you need control over dataset schemas.
Note that DSS never automatically changes the schema of a dataset while running a job. Changing the schema of a dataset is a dangerous operation, which can lead to previous data becoming unreadable, especially for partitioned datasets.
You can find more information on this page: https://doc.dataiku.com/dss/latest/schemas/index.html
In your case, I would advise to implement schema control in your python recipes, to prevent downstream schema changes.