Dependency between Projects for Building DataSets

RanjithJose · ‎02-12-2024

Is there an option to set dependencies as we do with DataStage ETL jobs while triggering jobs. Like set predecessor for a job that is waiting to be executed in the queue.

Ex: Imagine Project B is being built or refreshed, with in Project B there is a data set which gets refreshed only when Project A is built/refreshed. Is there an option to automatically set predecessor job to be executed first and then the current job. In this example get Project A built/run/refreshed first and then built/run/refresh Project B if Project A built/run/refresh was a success.

Operating system used: Windows

Turribeach · ‎02-12-2024

Same thing. Share dataset 1 in Project A to Project B and then use a Trigger on Dataset change to start your Project B scenario. There is no built-in way to build a flow zone as part of a scenario (although in v12 you can do it in the flow from the GUI) but you can manually create scenarios that match your flow zones (although you will have to maintain them manually). Additionally in v12.1 Dataiku added the "Stop at zone boundary" in the Build scenario step (see below) which should allow you to build dynamic flow zone scenarios.

View solution in original post

Turribeach · ‎02-12-2024

Projects don't get build, you build Datasets using Scenarios. As such you can add triggers to Scenarios as follows:

Trigger on Dataset change
Trigger on SQL Query change
Trigger after Scenario
Custom Python trigger

You can add multiple triggers but each trigger will fire individually and it's not currently possible to logically combine scenario triggers. Up vote this Product Idea if you want this functionality. If you have multiuple datasets being built in a project scenario and want to confirm all of them have been built before you start another project scenario you can add a dummy Python recipe at the end of your flow, add all the end of flow datasets as inputs and built a dummy output dataset in Python which you can then use as a Trigger on Dataset change on another project scenario.

So in summary the easiest solution for your requirement is to use a Trigger after Scenario scenario trigger to start scenario 1 from project B after scenario 2 from project A completes.

RanjithJose · ‎02-12-2024

Thank you @Turribeach !

My question was more about building data sets or zones which are from different projects.

Ex: Data Set 1 in flow zone 1 belongs to Project A, now Data Set 1 is shared with Project B flow zone 2. Now while building flow zone 2 i wanted to trigger building flow zone 1 from Project A. I ask this because since Data Set 1 is shared b/w projects, i want the latest information/data to be in Data Set 1, when i build data sets from flow zone 2 in Project B.

Turribeach · ‎02-12-2024

Same thing. Share dataset 1 in Project A to Project B and then use a Trigger on Dataset change to start your Project B scenario. There is no built-in way to build a flow zone as part of a scenario (although in v12 you can do it in the flow from the GUI) but you can manually create scenarios that match your flow zones (although you will have to maintain them manually). Additionally in v12.1 Dataiku added the "Stop at zone boundary" in the Build scenario step (see below) which should allow you to build dynamic flow zone scenarios.

RanjithJose · ‎02-13-2024

Thank you! This helps in a way.

Sign up to take part

Dependency between Projects for Building DataSets

Dependency between Projects for Building DataSets

Setup info