'Sub' scenario aborts 'main' scenario

Farhan · August 2022

I have created a scenario (Scenario A) that will trigger another scenario (Scenario

I want to abort scenario B when certain condition is met (which should also abort Scenario A)

My thought is to introduce Kill scenario step in Scenario B that will kill Scenario A (and by right, it will abort Scenario .

Will this design work fine and anything that i should take note of?

p.s This question is sort of a follow up to this Solved: Re: Self-abort scenario - Dataiku Community

Operating system used: Windows 10

Farhan · February 2024

hi.

My internal team raised a ticket to Dataiku to check on this approach long ago and if I remember correctly, they don't advise us to do this (for reasons I can't recall)

shivam · November 2023

Did you find the answer?

Turribeach · February 2024

I think that rather than killing any scenario you should implement metrics and checks and have a step that calculates them. Where the checks fails the calculate metrics scenario step should fail and if all the following steps are configured as per the default options they will not execute and the scenario will end with failure. If you can’t use this design post a new question with your specific requirements and what are you trying to achieve.

Farhan · February 2024

Yes, we have implemented something similar but there are situations that we dont want the scenario status to be triggered as Failure. So instead of failing it, we have designed that it will skip the remaining steps by configuring the conditions in the scenario step.

(accidentally reply to myself instead of Turribeach and didnt realise I cannot delete a reply after posting it)

Farhan · February 2024

Yes, we have implemented something similar but there are situations that we dont want the scenario status to be triggered as Failure. So instead of failing it, we have designed that it will skip the remaining steps by configuring the conditions in the scenario step.

CH007 · February 2024

Hi there,

I’m going to take a stab at your issue with some recommendations. My apologies if you already thought of this before:

Instead of having Scenario A trigger Scenario B and then have Scenario A aborted if a certain condition is satisfied (which is trying to go back in time, but you cannot abort Scenario A because it’s already occurred to trigger Scenario . How about just having one scenario, which is Scenario B and have this scenario only execute if a particular condition is met, if the condition is not met then no action should be taken.
Within Dataiku you can set up the action to be a Time-based trigger, so let’s say you want to check during certain hours/time frames of a day if a particular scenario is met, then you can leverage the Time-based Trigger, or create a custom trigger.

Lastly, I’m not really sure what your condition is that you’re trying to achieve but when you select the Steps menu within the scenario you can select ADD STEP and select the Execute Python code. This will open up the Python editor and here you can write your code with all of your conditions that you want to check for. If you require multiple steps/checks, you can build more steps to run in sequential order. I'm recommending the custom Python script because you have more freedom, flexibility and control over your conditions instead of using Metrics/Checks.
In a nutshell, you set the Scenario to kick off during certain intervals, the Scenario leverages a Python script (Custom Step), which performs all the checks you’re looking for, if the conditions are not met, you can write in your script for no action to be taken; however, you can also write the conditions within your script for the specific steps to be taken that you’re requesting.

Regards,

Christie

Turribeach · February 2024

I don’t understand what your problem is then.

Farhan · February 2024

No worries, my team has sorted this already anyway.

Farhan · February 2024

Thanks Christine

Before I responded, would like to mention that my team has approached this method in a different way so technically this is case closed.

But my responses to your points:

My team's dataiku projects usually have multiple data flows and these data flows are usually linked to each other. When we design our scenarios, (i am oversimplifying this but) we usually create one scenario for each data flow/zone. Let's say I have 3 zones, so there will be Scenario B, C, D. Then we will create scenario A will trigger B, C, D in sequence. One of the main reason why we design it in this way it is so that it is easy for us to control the scenario run especially while doing testing. So your suggestion to run scenario B as a stand-alone scenario does not fit our use case.
Mainly, the check that we do is to check for record count. If the record count of certain datasets are 0, then we don't want to continue the processing.
So let's say scenario B has found a particular dataset as empty, we don't want to resume scenario B and in turn, we dont want to trigger C and D as well since they are interconnected.

To achieve the above, our solutions are:

We use Python step to update a project variable. Let's call it "run_status". At the beginning of a scenario run, we will default the value to 'Success'. When certain conditions are met i.e certain dataset is empty, we will use python step to update run_status = Aborted.
This run_status variable is set as a condition in the scenario steps. Conditions can be set in a scenario step by updating the "Run this step" as "If condition satisfied" instead of "No prior steps failed"
So with this, we can skip the remaining steps of scenarios. The scenario will complete without executing the remaining steps.

Regards,

Farhan

Turribeach · February 2024

I think your approach is OK but I also think it can be improved further. For a start I don't like solutions that use project variables. Project variables are not ephemeral which means they preserve their value over different scenario runs. This can in turn leave your project variables in an inconsistent state which could in turn cause something executing when it shouldn't. So the best approach for what you are trying to do is to define/set scenario variables which will always be evaluated at run time. This can then make sure that if you execute different scenarios in a different order that they were intended you will catch the issue as the scenario variables will always be evaluated at run time. Scenario variables can be set either using a Define Variables scenario step or using a Custom Python scenario step which looks like this:

from dataiku.scenario import Scenario
scenario = Scenario()
all_scenario_vars = scenario.get_all_variables()
all_scenario_vars['new_scenario_var'] = "new value"
scenario.set_scenario_variables(**all_scenario_vars)

Having said that your requirement could be done without even using Python code. In this post I showed how to define a scenario variable based on a metric and then conditionally execute the remaining scenario steps. In this case I was using the count of files in a folder metric but I also covered a similar solution in this thread where it uses the dataset record count.

This solution is much easier to maintain and visualise since there is no Python code and all the variables used are run time variables so they will always be evaluated at run time leaving no risk of executing things incorrectly if project vars are left with improper values. There is also no confusing "run_status" which means nothing to someone new to the flow. The scenario steps will use count_of_rows_datasetX variables which anyone looking at the flow will understand what it means. Finally if you were to use a mail reporter to indicate the flow was aborted due to no records in some datasets you could easily show all the relevant datasets and their row count variables value so that someone looking at the email can easily which datasets where the ones that caused the scenario run to be aborted (as there could be more than one for instance).

'Sub' scenario aborts 'main' scenario

Best Answer

Answers

Categories

Setup Info

Tags