Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on January 9, 2025 8:22AM
Likes: 0
Replies: 1
Hi everyone,
I’m working on a dynamic process in Dataiku where I pull data from Athena using SQL, then do athena unload, and use the resulting dataset as an input for a pyspark recipe. The challenge is that for certain scenarios (e.g., specific therapeutic areas), the Athena query results in an empty dataset. So I want it it handle either way it can be empty it can have data in it.
When the empty dataset happens, the Python recipe throws the following error:
"Root path does not exist Error while connecting to dataset _____, caused by: DataStoreIOException: Root path of the dataset does not exist "
I’ve tried implementing fallback handling in my Python recipe to check for the dataset’s existence or handle it if it’s empty, but I’m still facing the same issue. Here’s what I’ve tried:
try-except
block to catch errors when reading the dataset.However, the error seems to occur before the Python code executes, likely because Dataiku is unable to connect to the input dataset at all when it’s empty.
What I am looking for
Please let me know if there is a solution for this issue.
Hi,
You have the option to set the "dataset" as empty as not ready so the recipe doesn't so it wouldn't fail
The other option is to use the option "ignore_flow" when interacting with the dataset so DSS doesn't check if the dataset is empty or not. But this means you can't include this dataset as input for the recipe instead you can interact with it directly from the recipe with ignore_flow=True then you can handle via try/except
csv_prepared = dataiku.Dataset("csv_prepared", ignore_flow=True)
Thanks