How to terminate the dataiku flow if the output of any recipe is an empty dataset.

Options
vaishnavi
vaishnavi Registered Posts: 40 ✭✭✭✭

How to terminate the dataiku flow if the output of any recipe is an empty dataset.

Answers

  • Miguel Angel
    Miguel Angel Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 118 Dataiker
    Options

    Hi,

    You can take advantage of scenarios to do so.

    One scenario step can be a 'Run Checks' on the particular dataset that could be empty. If successive steps are set so that they do not run if a prior step failed, the processing of the flow will stop if that dataset is indded empty.

    The checks the scenario will look for are those selected in the dataset's Status > Checks page.

    To create a check go to Status > Edit > Checks. Here, you can create a check that looks on the number of rows the dataset has, for example.

  • VitaliyD
    VitaliyD Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 102 Dataiker
    Options

    You can also use the option "Empty as not ready" in a dataset advanced settings to consider an empty dataset as not ready so that the downstream recipes will not run.

    Screenshot 2022-09-06 at 08.49.12.png

  • vaishnavi
    vaishnavi Registered Posts: 40 ✭✭✭✭
    Options

    Hi @VitaliyD
    I have already enabled that "Empty as not ready" option and tried it but still it gives me error saying

    "Input dataset <dataset name> is not ready, caused by: CodedIOException: in running <recipe_name>: Input partition NP of dataset <dataset name> is empty"

  • vaishnavi
    vaishnavi Registered Posts: 40 ✭✭✭✭
    Options

    Hi @MiguelangelC

    I created a new scenario with a step "Run checks" and then selected the dataset to check. Then I ran the flow by clicking on "build all" option in Flow actions. It still throws the same error "Input partition is empty, Input dataset is not ready". Could you please help me find where am going wrong here ?

  • Miguel Angel
    Miguel Angel Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 118 Dataiker
    Options

    The flow actions do not consider any scenarios or their steps. You'd need to build the desired datasets by running the scenario.

    A possible setup could be like this:

    Step 1: A Build/Train step that builds the dataset that could be empty

    Step2: A Run checks step which checks whether the dataset built in step1 is empty

    Step3: Another Build/Train step that builds the rest of the flow.

    There is more information about scenarios and how helpful they can be in automating actions and checks in the help: https://doc.dataiku.com/dss/latest/scenarios/index.html

  • VitaliyD
    VitaliyD Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 102 Dataiker
    Options

    That's the purpose of the option. It will error and the execution down the stream won't happen. Isn't it what you wanted?

  • vaishnavi
    vaishnavi Registered Posts: 40 ✭✭✭✭
    Options

    @VitaliyD
    Thanks for your response. Actually I do not want to execute the downstream if any dataset is empty. At the same time I do not want to see any error like "Job Failed" and then the error logs saying "Input is not ready....."

Setup Info
    Tags
      Help me…