Dependency between Projects for Building DataSets

Solved!
RanjithJose
Level 2
Dependency between Projects for Building DataSets

Is there an option to set dependencies as we do with DataStage ETL jobs while triggering jobs. Like set predecessor for a job that is waiting to be executed in the queue.

Ex: Imagine Project B is being built or refreshed, with in Project B there is a data set which gets refreshed only when Project A is built/refreshed. Is there an option to automatically set predecessor job to be executed first and then the current job. In this example get Project A built/run/refreshed first and then built/run/refresh Project B if Project A built/run/refresh was a success.


Operating system used: Windows

0 Kudos
1 Solution

Same thing. Share dataset 1 in Project A to Project B and then use a Trigger on Dataset change to start your Project B scenario. There is no built-in way to build a flow zone as part of a scenario (although in v12 you can do it in the flow from the GUI) but you can manually create scenarios that match your flow zones (although you will have to maintain them manually). Additionally in v12.1 Dataiku added the "Stop at zone boundary" in the Build scenario step (see below) which should allow you to build dynamic flow zone scenarios. 

Screenshot 2024-02-12 at 22.16.48.png

View solution in original post

0 Kudos
4 Replies
Turribeach

Projects don't get build, you build Datasets using Scenarios. As such you can add triggers to Scenarios as follows:

  1. Trigger on Dataset change
  2. Trigger on SQL Query change
  3. Trigger after Scenario
  4. Custom Python trigger

You can add multiple triggers but each trigger will fire individually and it's not currently possible to logically combine scenario triggers. Up vote this Product Idea if you want this functionality. If you have multiuple datasets being built in a project scenario and want to confirm all of them have been built before you start another project scenario you can add a dummy Python recipe at the end of your flow, add all the end of flow datasets as inputs and built a dummy output dataset in Python which you can then use as a Trigger on Dataset change on another project scenario. 

So in summary the easiest solution for your requirement is to use a Trigger after Scenario scenario trigger to start scenario 1 from project B after scenario 2 from project A completes. 

0 Kudos
RanjithJose
Level 2
Author

Thank you @Turribeach !

My question was more about building data sets or zones which are from different projects.

Ex: Data Set 1 in flow zone 1 belongs to Project A, now Data Set 1 is shared with Project B flow zone 2. Now while building flow zone 2 i wanted to trigger building flow zone 1 from Project A. I ask this because since Data Set 1 is shared b/w projects, i want the latest information/data to be in Data Set 1, when i build data sets from flow zone 2 in Project B.

 

0 Kudos

Same thing. Share dataset 1 in Project A to Project B and then use a Trigger on Dataset change to start your Project B scenario. There is no built-in way to build a flow zone as part of a scenario (although in v12 you can do it in the flow from the GUI) but you can manually create scenarios that match your flow zones (although you will have to maintain them manually). Additionally in v12.1 Dataiku added the "Stop at zone boundary" in the Build scenario step (see below) which should allow you to build dynamic flow zone scenarios. 

Screenshot 2024-02-12 at 22.16.48.png

0 Kudos
RanjithJose
Level 2
Author

Thank you! This helps in a way.

0 Kudos

Setup info

?
Tags (1)
A banner prompting to get Dataiku