Recommendation with build options to control rebuild propogation

me2
me2 Registered Posts: 48 ✭✭✭✭✭

I have a flow that doesn't need all datasets rebuilt and would love to get your recommendation on how to implement but also keep it simple.

Flow A I will want to build and launch with a scenario X. Normally I add Scenario step to Build with Build mode: "force-rebuild dataset and dependencies".


Dataset from Flow B in same project and due to time to rebuild and changes in data output, I don't want to rebuild this flow with scenario X. I only want to use the dataset from the last time it was built (use as-is) and it will be rebuilt using another scenario at a slower frequency.

Dataset from Flow C from another project and due to time to rebuild and changes in data output, I don't want to rebuild this flow with scenario X. I only want to use the dataset from the last time it was built (use as-is) and it will be rebuilt using another scenario at a slower frequency.

For the dataset from Flow B, I thought of a way to implement using Sync and "Flow - Rebuild behavior" but I can't help but think there might be an easier way.


Another option is to break up my flow A in a scenario into A and A'. Use Scenario step to Build with Build mode: "force-rebuild dataset and dependencies" for Flow A. Add another Scenario step to Build with Build mode: "build only this dataset". The challenge on that option is I might have additional steps after A' that will require additional scenario steps to finish building.


Also I can't use "build sections" for Flow B for two reasons... 1) How I currently use Flow Zones might cause other datasets I want built to not get built and 2) Our current version of Dataiku doesn't support building sections.


For Flow C, since the dataset can only be built from the source project then I just have to link to the output dataset.

Is there recommendation on how to implement using Dataset from Flow B & C into Flow A? What are the limitations?

tempsnip2.png

I found a great article that has helped me.

concept-dataset-building-strategies


Operating system used: Windows

Tagged:

Best Answer

Answers

  • me2
    me2 Registered Posts: 48 ✭✭✭✭✭

    Thank you for the reply. You are right, the best solution for Flow B is the utilize version 12.0+ features for building in zones.

    Since our upgrade to V12 will be a few weeks I am going to implement a solution with a sync to copy the dataset from flow B then use the rebuild behavior options + unique scenario step to prevent the entire flow B to get rebuilt.

    Once we upgrade to V12, I will implement the zone build.

Setup Info
    Tags
      Help me…