Reuse of Initial Dataset in Scenarios

Options
sj0071992
sj0071992 Partner, Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2022, Neuron 2023 Posts: 131 Neuron

Hi Team,

Suppose I have a source dataset "A" with 10 Billion Records.

After that, I used a split recipe based on a column to create 7 different process lines and at the end, we have 7 final datasets with names (B, C, D, E, F, G, and H)

Now I want to create a single Scenario to build all 7 Datasets but I don't want to rebuild Dataset "A" every time I build the final Dataset as it will take a lot of time and it is unnecessary to re-build it all the time.

How can I do that?

Thanks in Advance

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    Options

    Hi,

    To avoid rebuilding dataset A you can change the settings on the dataset. From Settings - Advanced you can set the Build Behavior to "Explicit" so it does not build dataset A as part of your build of the final dataset.

    Screenshot 2021-08-23 at 08.28.41.png

  • sj0071992
    sj0071992 Partner, Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2022, Neuron 2023 Posts: 131 Neuron
    Options

    Hi,

    Thanks for your response.

    But how can we handle this in Scenario? Can we create multiple Steps and force-rebuilding only the first Step and "Build Required Dataset" for the rest Steps?

    Will it work?

    Thanks in Advance

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    Options

    Hi,

    As explained here https://doc.dataiku.com/dss/latest/flow/building-datasets.html#preventing-a-dataset-from-being-built

    If the dataset is explicit build then even if you select "Build Required Dataset" it will not build this dataset recursively from a scenario.

    If you need to actually build this dataset you will need to add another step with build mode "build this dataset only" and add only dataset A.

    Let me know if you have any other questions.

Setup Info
    Tags
      Help me…