Force Rebuild + Build Required Dataset vs Spark Pipeline

Assume I have below 6 datasets
source dataset
intermediate 1 dataset
intermediate 2 dataset
intermediate 3 dataset
intermediate 4 dataset
target dataset
These are the steps I want to set up in my scenario
1. Force rebuild intermediate 1 dataset
2. Build required dataset intermediate 3 dataset (it will also build intermediate 2)
3. Build required dataset target dataset (it will also build intermediate dataset 4)
Without spark pipeline, step #1 will build intermediate 1, step #2 will build intermediate 2 and 3 while step #3 will build intermediate 4 and target dataset
But with Spark pipeline, sometimes when it comes to step#3, it will rebuild intermediate dataset 3 as well which I think is redundant. Is there a way to avoid intermediate 3 dataset to be rebuilt? I have tried to disable 'Can this recipe be merged in an existing recipes pipeline?' but doesn't seem to work.
One way that I know will work is by setting intermediate 3 dataset as explicit build. But if you can suggest another way without using the explicit build, that would be great
Operating system used: Windows 10
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,269 Dataiker
Hi,
What does the behavior look like now if you have Spark Pipelines enabled and you perform a recursive rebuild of the target dataset.
Can you try adding the option "Virtualizable in build on intermediate dataset 3 and see if this yields the behavior you are looking for?
Thanks,
Alex
-
What does the behavior look like now if you have Spark Pipelines enabled and you perform a recursive rebuild of the target dataset
You mean manually do the forced recursive rebuild by right click on the target dataset?Can you try adding the option "Virtualizable in build on intermediate dataset 3 and see if this yields the behavior you are looking for?
Have tried it out per your suggestion but did not work