Dependency between Projects for Building DataSets

Options
RanjithJose
RanjithJose Dataiku DSS Core Concepts, Registered Posts: 13 ✭✭✭✭

Is there an option to set dependencies as we do with DataStage ETL jobs while triggering jobs. Like set predecessor for a job that is waiting to be executed in the queue.

Ex: Imagine Project B is being built or refreshed, with in Project B there is a data set which gets refreshed only when Project A is built/refreshed. Is there an option to automatically set predecessor job to be executed first and then the current job. In this example get Project A built/run/refreshed first and then built/run/refresh Project B if Project A built/run/refresh was a success.


Operating system used: Windows

Best Answer

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,737 Neuron
    Answer ✓
    Options

    Same thing. Share dataset 1 in Project A to Project B and then use a Trigger on Dataset change to start your Project B scenario. There is no built-in way to build a flow zone as part of a scenario (although in v12 you can do it in the flow from the GUI) but you can manually create scenarios that match your flow zones (although you will have to maintain them manually). Additionally in v12.1 Dataiku added the "Stop at zone boundary" in the Build scenario step (see below) which should allow you to build dynamic flow zone scenarios.

    Screenshot 2024-02-12 at 22.16.48.png

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,737 Neuron
    Options

    Projects don't get build, you build Datasets using Scenarios. As such you can add triggers to Scenarios as follows:

    1. Trigger on Dataset change
    2. Trigger on SQL Query change
    3. Trigger after Scenario
    4. Custom Python trigger

    You can add multiple triggers but each trigger will fire individually and it's not currently possible to logically combine scenario triggers. Up vote this Product Idea if you want this functionality. If you have multiuple datasets being built in a project scenario and want to confirm all of them have been built before you start another project scenario you can add a dummy Python recipe at the end of your flow, add all the end of flow datasets as inputs and built a dummy output dataset in Python which you can then use as a Trigger on Dataset change on another project scenario.

    So in summary the easiest solution for your requirement is to use a Trigger after Scenario scenario trigger to start scenario 1 from project B after scenario 2 from project A completes.

  • RanjithJose
    RanjithJose Dataiku DSS Core Concepts, Registered Posts: 13 ✭✭✭✭
    Options

    Thank you @Turribeach
    !

    My question was more about building data sets or zones which are from different projects.

    Ex: Data Set 1 in flow zone 1 belongs to Project A, now Data Set 1 is shared with Project B flow zone 2. Now while building flow zone 2 i wanted to trigger building flow zone 1 from Project A. I ask this because since Data Set 1 is shared b/w projects, i want the latest information/data to be in Data Set 1, when i build data sets from flow zone 2 in Project B.

  • RanjithJose
    RanjithJose Dataiku DSS Core Concepts, Registered Posts: 13 ✭✭✭✭
    Options

    Thank you! This helps in a way.

Setup Info
    Tags
      Help me…