Force build a single dataset

ben_p
ben_p Neuron 2020, Registered, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant Posts: 143 ✭✭✭✭✭✭✭

Hi there,

Sometimes I have a requirement to build a single dataset, which could be on the end of a flow. DSS will not always rebuild the dataset if it is already populated, even if I know this data is out of date, for this reason I use a force refresh. Is is possible to do a force refresh on a single dataset, rather than having a force refresh also refresh all dependencies?

Ben

Tagged:

Answers

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    @ben_p

    Just a thought. I do the following to control the scope of my "forced rebuilds".

    If you go into the advanced setup of the dataset just before the recipie you want to run. There set the rebuild behavior to explicit or write protect. Then you can use the recursive force rebuild and it will not rebuild any of the datasets prior to the dataset that has been set to explicit or write protect.

    Hope that helps.

    P.S. One of the things I've also learned is the flow counts and update dates do not automatically update in the project flow page in all cases. I often have to refresh the flow page web page [Command][Shift]R/[control][shift]R in order to see the actual count and last re-build date of a dataset. This makes me some times think that my dataset has not been re-built.

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 321 Neuron

    Hi @ben_p
    ,

    Just to confirm that you are seeing datasets not get rebuilt if you choose the "Build only this dataset" option for the Build mode?

    I don't think we have experienced this. At least I'm not aware of it. We mostly are building SQL datasets and certainly "Build required datasets" won't rebuild SQL datasets when underlying data is changed (understandable as this is not as easy to determine as with files - although it'd be awesome if it was smart enough to only rebuild datasets that needed it).

    Marlan

Setup Info
    Tags
      Help me…