Reuse of Initial Dataset in Scenarios

Partner, Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2022, Neuron 2023 Posts: 131 Neuron

Hi Team,

Suppose I have a source dataset "A" with 10 Billion Records.

After that, I used a split recipe based on a column to create 7 different process lines and at the end, we have 7 final datasets with names (B, C, D, E, F, G, and H)

Now I want to create a single Scenario to build all 7 Datasets but I don't want to rebuild Dataset "A" every time I build the final Dataset as it will take a lot of time and it is unnecessary to re-build it all the time.

How can I do that?

Thanks in Advance

Welcome!

It looks like you're new here. Sign in or register to get started.

Answers

  • Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,254 Dataiker

    Hi,

    To avoid rebuilding dataset A you can change the settings on the dataset. From Settings - Advanced you can set the Build Behavior to "Explicit" so it does not build dataset A as part of your build of the final dataset.

    Screenshot 2021-08-23 at 08.28.41.png

  • Partner, Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2022, Neuron 2023 Posts: 131 Neuron

    Hi,

    Thanks for your response.

    But how can we handle this in Scenario? Can we create multiple Steps and force-rebuilding only the first Step and "Build Required Dataset" for the rest Steps?

    Will it work?

    Thanks in Advance

  • Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,254 Dataiker

    Hi,

    As explained here https://doc.dataiku.com/dss/latest/flow/building-datasets.html#preventing-a-dataset-from-being-built

    If the dataset is explicit build then even if you select "Build Required Dataset" it will not build this dataset recursively from a scenario.

    If you need to actually build this dataset you will need to add another step with build mode "build this dataset only" and add only dataset A.

    Let me know if you have any other questions.

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.