How to terminate the dataiku flow if the output of any recipe is an empty dataset.
How to terminate the dataiku flow if the output of any recipe is an empty dataset.
Answers
-
Miguel Angel Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 118 Dataiker
Hi,
You can take advantage of scenarios to do so.
One scenario step can be a 'Run Checks' on the particular dataset that could be empty. If successive steps are set so that they do not run if a prior step failed, the processing of the flow will stop if that dataset is indded empty.
The checks the scenario will look for are those selected in the dataset's Status > Checks page.
To create a check go to Status > Edit > Checks. Here, you can create a check that looks on the number of rows the dataset has, for example.
-
You can also use the option "Empty as not ready" in a dataset advanced settings to consider an empty dataset as not ready so that the downstream recipes will not run.
-
Hi @VitaliyD
I have already enabled that "Empty as not ready" option and tried it but still it gives me error saying"Input dataset <dataset name> is not ready, caused by: CodedIOException: in running <recipe_name>: Input partition NP of dataset <dataset name> is empty"
-
I created a new scenario with a step "Run checks" and then selected the dataset to check. Then I ran the flow by clicking on "build all" option in Flow actions. It still throws the same error "Input partition is empty, Input dataset is not ready". Could you please help me find where am going wrong here ?
-
Miguel Angel Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 118 Dataiker
The flow actions do not consider any scenarios or their steps. You'd need to build the desired datasets by running the scenario.
A possible setup could be like this:
Step 1: A Build/Train step that builds the dataset that could be empty
Step2: A Run checks step which checks whether the dataset built in step1 is empty
Step3: Another Build/Train step that builds the rest of the flow.
There is more information about scenarios and how helpful they can be in automating actions and checks in the help: https://doc.dataiku.com/dss/latest/scenarios/index.html
-
That's the purpose of the option. It will error and the execution down the stream won't happen. Isn't it what you wanted?
-
@VitaliyD
Thanks for your response. Actually I do not want to execute the downstream if any dataset is empty. At the same time I do not want to see any error like "Job Failed" and then the error logs saying "Input is not ready....."