Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on August 23, 2021 7:12AM
Likes: 0
Replies: 3
Hi Team,
Suppose I have a source dataset "A" with 10 Billion Records.
After that, I used a split recipe based on a column to create 7 different process lines and at the end, we have 7 final datasets with names (B, C, D, E, F, G, and H)
Now I want to create a single Scenario to build all 7 Datasets but I don't want to rebuild Dataset "A" every time I build the final Dataset as it will take a lot of time and it is unnecessary to re-build it all the time.
How can I do that?
Thanks in Advance
Hi,
To avoid rebuilding dataset A you can change the settings on the dataset. From Settings - Advanced you can set the Build Behavior to "Explicit" so it does not build dataset A as part of your build of the final dataset.
Hi,
Thanks for your response.
But how can we handle this in Scenario? Can we create multiple Steps and force-rebuilding only the first Step and "Build Required Dataset" for the rest Steps?
Will it work?
Thanks in Advance
Hi,
As explained here https://doc.dataiku.com/dss/latest/flow/building-datasets.html#preventing-a-dataset-from-being-built
If the dataset is explicit build then even if you select "Build Required Dataset" it will not build this dataset recursively from a scenario.
If you need to actually build this dataset you will need to add another step with build mode "build this dataset only" and add only dataset A.
Let me know if you have any other questions.