I have a Dataiku scenario with multiple steps. One of these steps includes downloading some files from a third-party website. In some instances, there won't be any files downloaded. As a result, the next step fails as there are no files to process.
In this case, how can I configure my scenario to not execute the remaining steps when there are no files in my downloaded files folder?
My Challenge: Conditionally Abort Scenario Steps With No Alerts
Since the absence of downloaded files some days is a normal situation, I wanted to prevent any “warning” or “failed” alerts from being triggered. Moreover, I also wanted a "clickers" solution with minimal or no code. (Note: Check out the original user question and solution on the Dataiku Community to see an alternative solution that uses Python code.)
My Aha Moment: Conditionally Controlling if a Step is Executed Based on Metric Value
With the help of Dataiku Support, I came to this conclusion: By using the "If condition is satisfied" option on a scenario step alongside scenario variables, you can conditionally control if a step is executed based on the value of a metric (in this case, the number of files on a folder).
Here are the step-by-step instructions to follow:
1. Create a scenario step to "Compute metrics" for the folder (let's call this step Compute_Metrics).
2. Next, create a scenario step to "Define scenario variables".
3. On the Define scenario variables step, toggle the "Evaluated variables" ON.
4. Then, define a new variable (let's call it number_of_files) with this formula:
5. You should replace ProjectID.FolderID with your corresponding values. Note that "Compute_Metrics" refers to the previous step name where you computed metrics for the folder.
6. Finally, in your conditional step, set "Run this Step" to "If condition is satisfied" and the condition to “number_of_files >= 1.”
That's it! The step will conditionally execute based on the metric value of a folder — no more failures or warnings.
Pro Tip: Use Boolean Logic to Simultaneously Preserve Normal Step Failure Behavior
One last thing to add: using the "If condition is satisfied" option may also have the unwanted side effect of overriding the default behavior that scenario steps only execute "if no prior step failed."
As per the Step Flow Control documentation, the possible values for outcome (which holds the current outcome of the scenario) are ‘SUCCESS,’ ‘WARNING’, ‘FAILED’, and ‘ABORTED.’ So if any previous step failed, outcome = FAILED and the conditional steps will not execute. In my case, I still wanted to ensure that no steps were executed if a prior step had failed, including those that used the "If condition is satisfied" option.
As a result, my actual condition ended up being this:
My key takeaway? If you use the "If condition is satisfied" option to evaluate a variable, remember to take this side effect into consideration.