Reuse of Initial Dataset in Scenarios
Hi Team,
Suppose I have a source dataset "A" with 10 Billion Records.
After that, I used a split recipe based on a column to create 7 different process lines and at the end, we have 7 final datasets with names (B, C, D, E, F, G, and H)
Now I want to create a single Scenario to build all 7 Datasets but I don't want to rebuild Dataset "A" every time I build the final Dataset as it will take a lot of time and it is unnecessary to re-build it all the time.
How can I do that?
Thanks in Advance
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,212 Dataiker
Hi,
To avoid rebuilding dataset A you can change the settings on the dataset. From Settings - Advanced you can set the Build Behavior to "Explicit" so it does not build dataset A as part of your build of the final dataset.
-
sj0071992 Partner, Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2022, Neuron 2023 Posts: 131 Neuron
Hi,
Thanks for your response.
But how can we handle this in Scenario? Can we create multiple Steps and force-rebuilding only the first Step and "Build Required Dataset" for the rest Steps?
Will it work?
Thanks in Advance
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,212 Dataiker
Hi,
As explained here https://doc.dataiku.com/dss/latest/flow/building-datasets.html#preventing-a-dataset-from-being-built
If the dataset is explicit build then even if you select "Build Required Dataset" it will not build this dataset recursively from a scenario.
If you need to actually build this dataset you will need to add another step with build mode "build this dataset only" and add only dataset A.
Let me know if you have any other questions.