How to terminate the dataiku flow if the output of any recipe is an empty dataset.

vaishnavi
Level 3
How to terminate the dataiku flow if the output of any recipe is an empty dataset.

How to terminate the dataiku flow if the output of any recipe is an empty dataset.

0 Kudos
7 Replies
MiguelangelC
Dataiker

Hi,

You can take advantage of scenarios to do so.

One scenario step can be a 'Run Checks' on the particular dataset that could be empty. If successive steps are set so that they do not run if a prior step failed, the processing of the flow will stop if that dataset is indded empty.

The checks the scenario will look for are those selected in the dataset's Status > Checks page.

To create a check go to Status > Edit > Checks. Here, you can create a check that looks on the number of rows the dataset has, for example.

vaishnavi
Level 3
Author

Hi @MiguelangelC 

I created a new scenario with a step "Run checks" and then selected the dataset to check. Then I ran the flow by clicking on "build all" option in Flow actions. It still throws the same error "Input partition is empty, Input dataset is not ready". Could you please help me find where am going wrong here ?

 

0 Kudos
MiguelangelC
Dataiker

The flow actions do not consider any scenarios or their steps. You'd need to build the desired datasets by running the scenario.

A possible setup could be like this:

Step 1: A Build/Train step that builds the dataset that could be empty

Step2: A Run checks step which checks whether the dataset built in step1 is empty

Step3: Another Build/Train step that builds the rest of the flow.

 

There  is more information about scenarios and how helpful they can be in automating actions and checks in the help: https://doc.dataiku.com/dss/latest/scenarios/index.html

0 Kudos
VitaliyD
Dataiker

You can also use the option "Empty as not ready" in a dataset advanced settings to consider an empty dataset as not ready so that the downstream recipes will not run.

Screenshot 2022-09-06 at 08.49.12.png

0 Kudos
vaishnavi
Level 3
Author

Hi @VitaliyD I have already enabled that "Empty as not ready" option and tried it but still it gives me error saying

"Input dataset <dataset name> is not ready, caused by: CodedIOException: in running <recipe_name>: Input partition NP of dataset <dataset name> is empty"

0 Kudos
VitaliyD
Dataiker

That's the purpose of the option. It will error and the execution down the stream won't happen. Isn't it what you wanted?

0 Kudos
vaishnavi
Level 3
Author

@VitaliyD Thanks for your response. Actually I do not want to execute the downstream if any dataset is empty. At the same time I do not want to see any error like "Job Failed" and then the error logs saying "Input is not ready....."