Handling Empty or Missing Datasets Dynamically in Dataiku Python Recipes

Devlina
Devlina Registered Posts: 1

Hi everyone,

I’m working on a dynamic process in Dataiku where I pull data from Athena using SQL, then do athena unload, and use the resulting dataset as an input for a pyspark recipe. The challenge is that for certain scenarios (e.g., specific therapeutic areas), the Athena query results in an empty dataset. So I want it it handle either way it can be empty it can have data in it.

When the empty dataset happens, the Python recipe throws the following error:

"Root path does not exist Error while connecting to dataset _____, caused by: DataStoreIOException: Root path of the dataset does not exist "

I’ve tried implementing fallback handling in my Python recipe to check for the dataset’s existence or handle it if it’s empty, but I’m still facing the same issue. Here’s what I’ve tried:

  • Used a try-except block to catch errors when reading the dataset.
  • Attempted to write an empty DataFrame to the output when the input is missing or empty.

However, the error seems to occur before the Python code executes, likely because Dataiku is unable to connect to the input dataset at all when it’s empty.

What I am looking for

  1. How can I handle scenarios where the input dataset might not exist or is empty without triggering this error?
  2. Is there a way to configure the dataset or Python recipe to proceed gracefully even when no data is available?

Please let me know if there is a solution for this issue.

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,239 Dataiker

    Hi,
    You have the option to set the "dataset" as empty as not ready so the recipe doesn't so it wouldn't fail

    The other option is to use the option "ignore_flow" when interacting with the dataset so DSS doesn't check if the dataset is empty or not. But this means you can't include this dataset as input for the recipe instead you can interact with it directly from the recipe with ignore_flow=True then you can handle via try/except

    csv_prepared = dataiku.Dataset("csv_prepared", ignore_flow=True)

    Thanks

Setup Info
    Tags
      Help me…