Reuse of Initial Dataset in Scenarios

sj0071992
Reuse of Initial Dataset in Scenarios

Hi Team,

Suppose I have a source dataset "A" with 10 Billion Records.

After that, I used a split recipe based on a column to create 7 different process lines and at the end, we have 7 final datasets with names (B, C, D, E, F, G, and H)

Now I want to create a single Scenario to build all 7 Datasets but I don't want to rebuild Dataset "A" every time I build the final Dataset as it will take a lot of time and it is unnecessary to re-build it all the time.

How can I do that?

Thanks in Advance

0 Kudos
3 Replies
AlexT
Dataiker

Hi,

To avoid rebuilding dataset A you can change the settings on the dataset. From Settings - Advanced you can set the Build Behavior to "Explicit" so it does not build dataset A as part of your build of the final dataset. 

Screenshot 2021-08-23 at 08.28.41.png

0 Kudos
sj0071992
Author

Hi, 

 

Thanks for your response.

 

But how can we handle this in Scenario? Can we create multiple Steps and force-rebuilding only the first Step and "Build Required Dataset" for the rest Steps?

Will it work?

 

Thanks in Advance

0 Kudos
AlexT
Dataiker

Hi,

As explained here https://doc.dataiku.com/dss/latest/flow/building-datasets.html#preventing-a-dataset-from-being-built 

If the dataset is explicit build then even if you select "Build Required Dataset" it will not build this dataset recursively from a scenario. 

If you need to actually build this dataset you will need to add another step with build mode "build this dataset only" and add only dataset A.

Let me know if you have any other questions.