Want to Control the Execution of Scenario Steps With Conditional Logic? Here’s How!
Scheduled File Downloading and Processing
I have a Dataiku scenario with multiple steps. One of these steps includes downloading some files from a third-party website. In some instances, there won't be any files downloaded. As a result, the next step fails as there are no files to process.
In this case, how can I configure my scenario to not execute the remaining steps when there are no files in my downloaded files folder?
My Challenge: Conditionally Abort Scenario Steps With No Alerts
Since the absence of downloaded files some days is a normal situation, I wanted to prevent any “warning” or “failed” alerts from being triggered. Moreover, I also wanted a "clickers" solution with minimal or no code. (Note: Check out the original user question and solution on the Dataiku Community to see an alternative solution that uses Python code.)
My Aha Moment: Conditionally Controlling if a Step is Executed Based on Metric Value
With the help of Dataiku Support, I came to this conclusion: By using the "If condition is satisfied" option on a scenario step alongside scenario variables, you can conditionally control if a step is executed based on the value of a metric (in this case, the number of files on a folder).
Here are the step-by-step instructions to follow:
1. Create a scenario step to "Compute metrics" for the folder (let's call this step Compute_Metrics).
2. Next, create a scenario step to "Define scenario variables".
3. On the Define scenario variables step, toggle the "Evaluated variables" ON.
4. Then, define a new variable (let's call it number_of_files) with this formula:
toNumber(filter(parseJson(stepOutput_Compute_Metrics)['ProjectID.FolderID_NP']['computed'], x, x["metricId"]=="basic:COUNT_FILES")[0].value)
5. You should replace ProjectID.FolderID with your corresponding values. Note that "Compute_Metrics" refers to the previous step name where you computed metrics for the folder.
6. Finally, in your conditional step, set "Run this Step" to "If condition is satisfied" and the condition to “number_of_files >= 1.”
That's it! The step will conditionally execute based on the metric value of a folder — no more failures or warnings.
Pro Tip: Use Boolean Logic to Simultaneously Preserve Normal Step Failure Behavior
One last thing to add: using the "If condition is satisfied" option may also have the unwanted side effect of overriding the default behavior that scenario steps only execute "if no prior step failed."
As per the Step Flow Control documentation, the possible values for outcome (which holds the current outcome of the scenario) are ‘SUCCESS,’ ‘WARNING’, ‘FAILED’, and ‘ABORTED.’ So if any previous step failed, outcome = FAILED and the conditional steps will not execute. In my case, I still wanted to ensure that no steps were executed if a prior step had failed, including those that used the "If condition is satisfied" option.
As a result, my actual condition ended up being this:
My key takeaway? If you use the "If condition is satisfied" option to evaluate a variable, remember to take this side effect into consideration.
Enjoy!
Additional Resources
Read more about this user question and the solution here: Conditional execute of scenario step without steps failing or giving warnings
To learn more about scenarios in Dataiku, visit this tutorial in the Knowledge Base: Concept: Scenarios
Answers
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
Great tip.
How did you figure out the 'Shaker' Formula Code that you ended up using in the step?toNumber(filter(parseJson(stepOutput_Compute_Metrics)['ProjectID.FolderID_NP']['computed'], x, x["metricId"]=="basic:COUNT_FILES")[0].value)
Do you have any tricks to suggest, If you had other conditions you wanted to monitor? -
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,984 Neuron
Thanks @tgb417
. That is a great question and one that I should have documented too with the trick, but better late than never! There are several options to work with the resulting JSON. The one that I used is this. The first step is to get the full JSON output so you can start playing with it. So in a scenario we add a step to compute the metrics we want to extract values from. In my test below I am calculating the metrics of a Folder:It's important to give the step a meaningful name without spaces. Next we create a Define variables step, we toggle the Evaluated variables setting and define a variable as: parseJson(stepOutput_Compute_Metrics)
Note that the suffix after stepOutput_ is the name of the step you want to get the metrics from:
If you know run this scenario you will find the value of the variable in the Scenario execution logs:
[2023/01/27-10:03:22.599] [FT-ScenarioThread-onZHv2rB-5396] [INFO] [dip.scenario.step.definevars] scenario SECFILLINGS.TEST#2023-01-27-10-03-22-500 - [ct: 61] Update variable initial_json = parseJson(stepOutput_Compute_Metrics) [2023/01/27-10:03:22.600] [FT-ScenarioThread-onZHv2rB-5396] [INFO] [dip.scenario.step.definevars] scenario SECFILLINGS.TEST#2023-01-27-10-03-22-500 - [ct: 62] --> Evaluated to {"SECFILLINGS.CaMoYxZE_NP":{"partition":"NP","computed":[{"metricId":"basic:SIZE","metric":{"metricType":"SIZE","dataType":"BIGINT","id":"basic:SIZE","type":"basic"},"dataType":"BIGINT","value":"0"},{"metricId":"basic:COUNT_FILES","metric":{"metricType":"COUNT_FILES","dataType":"BIGINT","id":"basic:COUNT_FILES","type":"basic"},"dataType":"BIGINT","value":"0"},{"metricId":"reporting:METRICS_COMPUTATION_DURATION","metric":{"metricType":"METRICS_COMPUTATION_DURATION","dataType":"BIGINT","id":"reporting:METRICS_COMPUTATION_DURATION","type":"reporting"},"dataType":"BIGINT","value":"5"}],"startTime":1674813802547,"endTime":1674813802552,"runs":[{"engine":"Basic"}],"target":
The value of the metrics output is the JSON shown after the "Evaluated to" log line until the (class org.json.JSONObject) part (you need to exclude that). Once you have the full JSON value you can play with it to start constructing your formula. Usually I copy/paste the JSON on a JSON formatter to be able to quickly understand the structure. If you don't have one in your machine you can use https://jsonformatter.org/.
In my case I was interested in the COUNT_FILES metric value. To build the formula I usually go to any Prepare recipe step, add a formula step and click on the Open the Editor Panel. Now paste the resulting JSON output from the log and enclose it with the parseJson() function. You need to add single quotes to the whole JSON as parseJson() expects a string. Now you can look at the Sample output of the formula and start to work on how to extract the desired values:
To get to value I want I first need to limit the output to the ["ProjectID.FolderID_NP"]["computed"] section by adding ["SECFILLINGS.CaMoYxZE_NP"]["computed"] at the end of the formula:
If we look at the resulting output in JSON formatter we now managed to filter the 3 metrics of the JSON:
The next step is to filter for the desired metric section. This can be done with the very handy filter() function:
Unfortunately Dataiku seems to throw some validation errors even though the syntax is correct (I have reported this to Dataiku Support). You can still save though and the function will work fine and extract the desired values. The output with the filter is:
Here I used the "Show complete value" option over the new dummy column to see the output rather than using JSON Formatter. We finally have the desired metric selected so we can simply add [0].value at the end to get its value and we also enclose everything in a toNumber() function to make sure we get a number data type and we can use it on our conditional expressions as a number:
Obviously if you need the output to be a string you can skip the toNumber() function.
I will be interested to know if anyone has a better way of doing this in a "Clicker" way (ie without Python). Enjoy! -
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
Great post thank you. I use a few variants to your process. Not clear if mine are any better or worse than your approach, but I thought I’d share in case they are helpful to you or others reading this post.
- First when trying to capture the available values, I write them to the local project variables with a scenario project variable step. In that step I’ll first use a parseJason like you are showing. I’ve done this rather than pulling the details out of the scenario log. I’m not clear if I had found the base object as you show in your lovely parseJson command. The local project variable screen does a fairly good job of formatting the JSON.
- That said when having to clean up JSON to look at it directly. I’ve tended to copy the JSON into Visual Studio code, and use a JSON formatter there.
Two other points you might be interested in:
- I’ve made a Dataiku product idea around this topic https://community.dataiku.com/t5/Product-Ideas/Enhanced-UI-UX-in-the-Senario-Step-Builder/idi-p/31029 If you think that this is a good idea for clickers, please upvote this idea, and add any comments that might make your experience better.
- Second, you may know that in prepare recipes you can copy and past steps. As a clicker you may have also discovered that there are some steps that are hard to edit once created. (like reordering column order once the step is created or updating long lists of replaces. ) My trick is to copy the prepare recipe step as usual into your computers clipboard. Then rather than paste it directly back into Dataiku DSS. Past it into a JSON editor. This shows you the JSON that the Dataiku Team uses to define the prepare recipe step. Make the change to the JSON that is the recipient step in the JSON editor. I’ve found the JSON to be fairly self explanatory. Then copy the JSON out of the editor, and paste it back into the Prepare recipe. Now you have a new step setup the way you want. This is of course “coloring out side of the lines” a bit. But, if you are careful you can definitely save some time.
Hope some of that is helpful to you or others reading this thread. And thank you for sharing this detailed set of tips and tricks.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,984 Neuron
PS: Dataiku Support has confirmed the issue with the "unknown tokens" has been fixed in v11.2.1 or higher. HEre is how the Formula Editor and Preview Sample output looks in 11.2.1:
Now there is no validation error and the Sample output is displayed properly. I have up voted your Idea and I will add a reference to this post as well, to give Dataiku more context and different use cases.
-
This was a great informative post. Using this post I created a similar notification but required one small change in syntax.
For the Scenario Condition:
number_of_files >= 1 && outcome == 'SUCCESS'
I had to modify to read this.
number_of_files >= '1' && outcome == 'SUCCESS'
For the run condition to work. Without the ' ' the condition was fail even though it should have passed.
Thank you!
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,984 Neuron
You must be missing the final toNumber() when you define your variable so your variable is a string not a number. If you look at the log of the Define Scenario Variable step you should see the variable evaluated and it's data type. You should make sure it's defined as a number so it can be compared properly.
-
You are absolutely spot on! I originally had the toNumber() but wanted to find a way to convert to an interger. Then fast forward a few weeks and here we are. I'm going to modify the condition so it remains a text and not be '0'. Thank you for the quick reply!
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,984 Neuron
The toNumber() function handles the conversion for you so you don't need to worry about that. In fact you are getting an Integer. If you look at the step log when the variable is defined you will see something like this:
--> Evaluated to 999 (class java.lang.Long)
The Dataiku backend runs in Java so the data types are of course from Java. "The long data type is a 64-bit two's complement integer. The signed long has a minimum value of -263 and a maximum value of 263-1". This data type is also called BigInt.
As a test I just did a custom metric and returned a number with decimals in the metric value. Then I converted it using the toNumber() function and this is what the step log showed:
--> Evaluated to 42.123456789 (class java.lang.Double)
As you can see the toNumber() function correctly defined the variable as a double.
-
Thank you! Knowing how to read the logs to understand is very helpful for my future projects.
The only comment I would add, I did not see this in the Step Log. I did find it in the Scenario Log.