The 'exec_recipe_fragment' method of 'SQLExecutor2' class fails to perform

radiantly · March 2024

Hello, I was exploring the following code example:

https://developer.dataiku.com/latest/tutorials/data-engineering/sql-in-code/index.html#code-recipe-examples

But I got an error from the exec_recipe_fragment method (attached below). I tried both options of the overwrite_output_schema parameter but did not succeed. Any help to resolve this error would be great .

Using a postgres database as instructed in the example. Dataiku enterprise version 11.4.3.

Thanks.

Turribeach · March 2024

This is because the exec_recipe_fragment() method is not meant to be executed in a Notebook but in a Recipe. So run your code in a Python Recipe and it will work fine. The whole point of the exec_recipe_fragment() method is that it pushes down the execution of the query to the database, the data never leaves your database. Therefore even if you could run it in a Notebook there will be nothing to inspect as there is no dataframe with the data in memory. If you want to inspect the data in a Notebook you can use the query_to_df() method, there is an example shown in the link you posted. This method, as its name implies, copies the output of SQL execution to a dataframe which can be inspected in a Notebook. Of course query_to_df() will be slower than using exec_recipe_fragment() as it will need load all the resulting data in memory and may even fail if you don't have enough memory to hold all your data in memory. And then it has the additional step of having to write the data back to the database (see below). That's why exec_recipe_fragment() is preferred for performance reasons.

Finally the query_to_df() example shown in the link you posted is incomplete as you can also write the df back to the output dataset to complete your recipe (should you still prefer to have the option to inspect the data in a Notebook). This is a very common pattern when you use Python recipes and you manipulate the data using Pandas dataframes. If you add the following lines below the query_to_df() line to execute your query it will write the resulting dataframe back to the dataset output (assuming the dataset output is called "NYC_trips_zones"):

# Write recipe outputs
NYC_trips_zones = dataiku.Dataset("NYC_trips_zones")
NYC_trips_zones.write_with_schema(df)

Turribeach · March 2024

Is your Dataiku connection name called "pg"?

radiantly · March 2024

Thank you for your reply. You are correct to point out that my database 'connection' name is not
'pg'. I have rerun the notebook with the appropriate connection name but get the same error.
It seem like the issue is with the "activityId" parameter in the Pyhton API, though not sure
what it is.

Please note that I am able to run the query in the SQL Notebook to get the desired output.
Besides, I have run the same query in the SQL Script and was able to create the new table
in the database. I have also checked the consistent use of the ${projectKey} for table names.

Thank you.

Vividha · July 2024

https://community.dataiku.com/discussion/comment/41359#Comment_41359

@Turribeach Even if exec_recipe_fragment() method won't execute in notebook would the database still get created and it has to be through recipe only? I am currently working on a pipeline where I need the exec_recipe_fragment() command to preprocess the raw data which I can't do using query_to_df due to its sheer volume. Is there a way to use that command in notebook?

The 'exec_recipe_fragment' method of 'SQLExecutor2' class fails to perform

Best Answer

Answers

Categories

Setup Info

Tags