Conditional execute of scenario step without steps failing or giving warnings

Solved!
Turribeach
Conditional execute of scenario step without steps failing or giving warnings

Hi,

I have a scenario that runs other scenarios as steps. One of these steps downloads some files from a third party website. In some instances there won't be any files downloaded so the next step fails as there are no files to process. I would like to be able to not execute all the remaining scenario steps when there are no files on my folder. However I don't want any steps to be set to warning or failing since it is possible that we won't get any files some days and that is a normal situation we don't want to trigger alerts for. 

There is a similar post already although this requires the use of metrics which will cause the step to fail or give a warning:

https://community.dataiku.com/t5/Using-Dataiku/Scenario-Conditional-on-Dataset-Metric/m-p/8474

Any ideas how to do this? I am thinking in using scenario variables but haven't seen a good example yet. Thanks

0 Kudos
1 Solution
Turribeach
Author

Thanks fchataigner2, that's a valid solution that I initially thought of doing but then I decided to see if I could do it in a "clickers" way, without using Python code. With the help of Dataiku Support I came to this solution:

  1. Create a scenario step to "Compute metrics" for the folder (let's call this step Compute_Metrics).
  2. Next create a scenario step to "Define scenario variables"
  3. On the Define scenario variables step toggle the "Evaluated variables" ON
  4. Then define a new variable (let's call it number_of_files) with this formula: toNumber(filter(parseJson(stepOutput_Compute_Metrics)['ProjectID.FolderID_NP']['computed'], x, x["metricId"]=="basic:COUNT_FILES")[0].value)
  5. You should replace ProjectID.FolderID with your corresponding values. Note that "Compute_Metrics" refers to the previous step name where you computed metrics for the folder
  6. Finally in your conditional step set "Run this Step" to "If condition is satisfied" and the condition to: number_of_files >= 1

That's it! The step will conditionally execute based on the metric value of a folder. No failures or warnings.

View solution in original post

8 Replies
fchataigner2
Dataiker

Hi,

you can use the run condition on scenario steps for that. For example with a "Execute python code" like this one:

import dataiku
mf = dataiku.Folder('f')
contents = mf.list_paths_in_partition()

f_is_empty = len(contents) == 0

from dataiku.scenario import Scenario
Scenario().set_scenario_variables(f_is_empty=f_is_empty)

all the steps after this code can setup a run condition like:

Screenshot 2022-06-29 at 07.24.02.png

Then when there are files in the folder "f", you get these subsequent steps run, and when there are no files, the steps are simply skipped

0 Kudos
Turribeach
Author

Thanks fchataigner2, that's a valid solution that I initially thought of doing but then I decided to see if I could do it in a "clickers" way, without using Python code. With the help of Dataiku Support I came to this solution:

  1. Create a scenario step to "Compute metrics" for the folder (let's call this step Compute_Metrics).
  2. Next create a scenario step to "Define scenario variables"
  3. On the Define scenario variables step toggle the "Evaluated variables" ON
  4. Then define a new variable (let's call it number_of_files) with this formula: toNumber(filter(parseJson(stepOutput_Compute_Metrics)['ProjectID.FolderID_NP']['computed'], x, x["metricId"]=="basic:COUNT_FILES")[0].value)
  5. You should replace ProjectID.FolderID with your corresponding values. Note that "Compute_Metrics" refers to the previous step name where you computed metrics for the folder
  6. Finally in your conditional step set "Run this Step" to "If condition is satisfied" and the condition to: number_of_files >= 1

That's it! The step will conditionally execute based on the metric value of a folder. No failures or warnings.

CoreyS
Dataiker Alumni

Thank you for sharing your solution @Turribeach!

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as โ€˜Accepted Solutionโ€™ to help others like you!
0 Kudos
Turribeach
Author

One last thing to add. By using the "If condition is satisfied" on a scenario step you can conditionally control if the step is executed based on the value of a metric, in this case the number of files on a folder. But using the "If condition is satisfied" option may also have an unwanted side effect. By default scenario steps only execute "if no prior step failed" so if you use the "If condition is satisfied" option to evaluate a variable you should take this  side effect into consideration. In my case I still wanted to make sure no steps executed if a prior step had failed, even those that use the "If condition is satisfied" option. So my actual condition ended up being this:

number_of_files >= 1 && outcome == 'SUCCESS'

 
  • outcome : the current outcome of the scenario; possible values are โ€˜SUCCESSโ€™, โ€˜WARNINGโ€™, โ€˜FAILEDโ€™, โ€˜ABORTEDโ€™

So outcome holds the current state of the scenario which means that if any prior step failed outcome = FAILED and the conditional steps will not execute. Enjoy!

me2
Level 3

Thank you for sharing the step-by-step.  I am getting an error in the "Define variable" and most likely it is how I am porting over the formula. 

Checking the scenario log, I think I got the metric name correct.  For me I'm using "records:COUNT_RECORDS".

I also think I got the project ID correctly using the URL to retrieve.

I think I'm getting the folder ID wrong.  I looked up how to retrieve the folder ID but doesn't seem to apply to this case.  Any ideas or examples someone can share?

Error:

java.lang.Exception
parseJson failed: Missing value at 0 [character 1 line 1]

0 Kudos
Turribeach
Author

If open the folder in your project the ID will be shown in the web browser URL. 

0 Kudos
me2
Level 3

The project is not in any folder so no folder name is listed in the URL. 

my URL is .../projects/projectID/scenarios/.

So if the project is in the "base" folder then would the syntax ['ProjectID.FolderID_NP']

equal

['ProjectID']?  I tried several variations with no success.

Thank you.

0 Kudos
Turribeach
Author

The folder is not the folder where the project is but the folder in the flow where the files I was trying to count. At this stage it will best for you to start a new thread and describe exactly what you are trying to achieve because it doesnโ€™t look like is the same situation as the I described in my thread.