Discover the winners & finalists of the 2022 Dataiku Frontrunner Awards!READ THEIR USE CASES

Programatically Create Scenario Steps

Solved!
benmoss
Level 2
Programatically Create Scenario Steps

Hi All,

I'm looking for advice on a solution that I am looking to develop. Specifically we have a series of projects (19 in total) that contain the same outputs but slightly different logic in terms of how to create them.

We want each project's scenario, which is triggered by our users to be the same, however, sometimes we may notice gaps within the scenario that needs to be corrected or added.

At present we are manually replicating changes across all 19 project scenarios, but there is obviously a flaw in this in that we might not correctly replicate these changes!

I'm looking for ideas on how to approach this in a better way, I've been exploring whether it's possible to use the Dataiku API in order to programatically update the scenarios based off a template scenario, however I seem to have a wall here in regards to the 'steps' not being an updatable attribute of the scenario settings (compared to things like the reporters or the run as user (which we are already programatically updating).

Thanks in advance!

Ben


Operating system used: Windows

1 Solution
benmoss
Level 2
Author

For the purpose of completion, sharing the code that we deployed for our solution...

 

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import dataikuapi

client = dataiku.api_client()

template_project = client.get_project("TEMPLATEPROJECTID")
template_scenario = project.get_scenario("TEMPLATE_SCENARIO_ID")
template_scenario_settings = template_scenario.get_settings()
template_steps = template_scenario_settings.raw_steps

list_projects = ["PROJECT A","PROJECT B","PROJECT C"]

for p in list_projects:
    
    project = client.get_project(p)
    scenario = project.get_scenario("COMMON_SCENARIO_ID")
    scenario_settings = scenario.get_settings()
    
    for i in range(len(scenario_settings.raw_steps)):       
        del scenario_settings.raw_steps[0]
    
    for step in raw_steps:
        scenario_settings.raw_steps.append(step)
        
    scenario_settings.save()

 

Thanks again @VitaliyD for bringing is to a solution!

View solution in original post

8 Replies
benmoss
Level 2
Author

Too add to this, it seems like something could be possible using the 'scenario' API whereby we use a python script in order to create our scenario and then that python script could be stored and referenced from each flow via the global shared repository.

Unfortunately in this case that doesn't seem possible as we require a folder check as one of our steps and this doesn't yet seem like it is supported (file checks are) by the 'scenario' API.

I'm still on the hunt for ideas!

 

0 Kudos
VitaliyD
Dataiker

Hi,

You can copy the steps from one scenario to another using the scenario settings. Please refer to the example below:

import dataiku, json
from dataiku import pandasutils as pdu
import pandas as pd

client = dataiku.api_client()
project = client.get_default_project()

scenario = project.get_scenario('1')
scenario_settings = scenario.get_settings()
raw_steps = scenario_settings.raw_steps

scenario2 = project.get_scenario('2')
scenario_settings2 = scenario2.get_settings()
scenario_settings2.raw_steps.append(raw_steps[0])
scenario_settings2.save()

Screenshot 2022-10-05 at 12.04.31.png

I hope this helps.

Best, 

Vitaliy

0 Kudos
benmoss
Level 2
Author

Thank you for this reply @VitaliyD, this was extemely helpful.

Because of the use of the 'append' I understand (and have tested) that this would add steps from the template scenario to the bottom of our already existing scenario, which already has steps in.

Do you know how the code provided could be edited for this case where we want to essentially set the raw steps of the second projects to be exactly the same as the first projects steps?

I looked at adjusting the code  you provided to do the following but unfortunately this did not work.

scenario_settings2.raw_steps = raw_steps

One method I can think of would be to delete the existing scenario on the second project, then create it from scratch before appending each step from the 'template project scenario'.

Before I develop this I just want to check whether there is a neater solution? For example, is it possible to (rather than delete the entire scenario), clear all the steps from the scenario before using the append method you provided.

Thanks again.

Ben

0 Kudos
VitaliyD
Dataiker

Hi, 

You can copy a step by step. Just make sure the step already exists. Example:

# copy step by step
scenario_settings2.raw_steps[0] = raw_steps[0]
# or modyfy the existing step as required
scenario_settings2.raw_steps[0] = {'delayBetweenRetries': 10,
  'id': 'reload_schema__d_us_50',
  'maxRetriesOnFail': 0,
  'name': 'Step #1',
  'params': {'items': [{'itemId': 'us_50',
     'partitionsSpec': '',
     'type': 'DATASET'}],
   'proceedOnFailure': False},
  'resetScenarioStatus': False,
  'runConditionExpression': '',
  'runConditionStatuses': ['SUCCESS', 'WARNING'],
  'runConditionType': 'RUN_IF_STATUS_MATCH',
  'type': 'reload_schema'}
scenario_settings2.save()

 Best,

Vitaliy

0 Kudos
benmoss
Level 2
Author

Right that makes sense, it's more than plausible though that we adjust the number of steps within the scenario (for example the template scenario now has five steps but the scenario that we are looking to override has six steps).

To confirm there is no delete step action?

Anyhow, the information you provided is great and has bought me to a solution of sorts, which is (in order to accomdate with the issue outlined at the beginning of this post), delete the scenario, create a new scenario and then use the append method you provided.

Thanks again!

Ben

0 Kudos
VitaliyD
Dataiker

The steps are just a python list in the scenario settings. So you can delete a step by just removing an element from the list. Example:

del scenario_settings2.raw_steps[0]
scenario_settings2.save()

Best.

0 Kudos
benmoss
Level 2
Author

For the purpose of completion, sharing the code that we deployed for our solution...

 

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import dataikuapi

client = dataiku.api_client()

template_project = client.get_project("TEMPLATEPROJECTID")
template_scenario = project.get_scenario("TEMPLATE_SCENARIO_ID")
template_scenario_settings = template_scenario.get_settings()
template_steps = template_scenario_settings.raw_steps

list_projects = ["PROJECT A","PROJECT B","PROJECT C"]

for p in list_projects:
    
    project = client.get_project(p)
    scenario = project.get_scenario("COMMON_SCENARIO_ID")
    scenario_settings = scenario.get_settings()
    
    for i in range(len(scenario_settings.raw_steps)):       
        del scenario_settings.raw_steps[0]
    
    for step in raw_steps:
        scenario_settings.raw_steps.append(step)
        
    scenario_settings.save()

 

Thanks again @VitaliyD for bringing is to a solution!

CoreyS
Dataiker Alumni

Thank you for sharing this solution with our Community @benmoss!

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
0 Kudos