Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

Build flow between two datasets

Recursive builds are a great feature. Build flow outputs reachable from here is a great feature. But in large projects, I also often find myself just wanting to constrain Dataiku to build a certain segment of my flow and nothing else. No upstream dependencies. No downstream outputs. Just start at one point, end at another, execute in order with all the dependency-calculation goodness of the Dataiku DAG.

Screenshot 2021-07-28 104417.jpg

For a simple example, I'd love to be able to select the leftmost and rightmost datasets here and with one click "run everything between" these datasets, resulting in my rightmost dataset being built, but nothing upstream of the leftmost dataset being rebuilt (since those take hours to finish and Dataiku's dependency management often triggers them to build even when nothing has changed). The dataset at the bottom wouldn't be rebuilt in this scenario.

For more general cases, it would also be really cool to get a preview of the sections of the flow that will be built every time, and have the option to add and remove items from that each time. That way as an alternative workflow, when selecting all of these and building or when building flow outputs reachable from here or when building recursively, I can just deselect the section I don't want rebuilt.

2 Comments
ktgross15
Dataiker
Dataiker
Status changed to: In Backlog

Thanks, this is in our backlog and I will note your interest - note that we also have this community product idea which is generally the same concept: https://community.dataiku.com/t5/Product-Ideas/Start-Flow-from-specific-dataset/idi-p/11809

tgb417
Neuron
Neuron

I'm wondering if doing this in a flow zone makes sense.  

So one could rebuild an entire flow zone as a way of selecting the scope of the re-build.