The 'exec_recipe_fragment' method of 'SQLExecutor2' class fails to perform

Solved!
radiantly
Level 2
The 'exec_recipe_fragment' method of 'SQLExecutor2' class fails to perform

Hello, I was exploring the following code example:

https://developer.dataiku.com/latest/tutorials/data-engineering/sql-in-code/index.html#code-recipe-e...

But I got an error from the exec_recipe_fragment  method (attached below). I tried both options of the overwrite_output_schema parameter but did not succeed. Any help to resolve this error would be great .   

Using a postgres database as instructed in the example. Dataiku enterprise version 11.4.3. 

Thanks.

 

Capture02.PNG

Capture03.PNG

โ€ƒ

 

 

0 Kudos
1 Solution
Turribeach

This is because the exec_recipe_fragment() method is not meant to be executed in a Notebook but in a Recipe. So run your code in a Python Recipe and it will work fine. The whole point of the exec_recipe_fragment() method is that it pushes down the execution of the query to the database, the data never leaves your database. Therefore even if you could run it in a Notebook there will be nothing to inspect as there is no dataframe with the data in memory. If you want to inspect the data in a Notebook you can use the query_to_df() method, there is an example shown in the link you posted. This method, as its name implies, copies the output of SQL execution to a dataframe which can be inspected in a Notebook. Of course query_to_df() will be slower than using exec_recipe_fragment() as it will need load all the resulting data in memory and may even fail if you don't have enough memory to hold all your data in memory. And then it has the additional step of having to write the data back to the database (see below). That's why exec_recipe_fragment() is preferred for performance reasons. 

Finally the query_to_df() example shown in the link you posted is incomplete as you can also write the df back to the output dataset to complete your recipe (should you still prefer to have the option to inspect the data in a Notebook). This is a very common pattern when you use Python recipes and you manipulate the data using Pandas dataframes. If you add the following lines below the query_to_df() line to execute your query it will write the resulting dataframe back to the dataset output (assuming the dataset output is called "NYC_trips_zones"):

 

# Write recipe outputs
NYC_trips_zones = dataiku.Dataset("NYC_trips_zones")
NYC_trips_zones.write_with_schema(df)

 

 

View solution in original post

3 Replies
Turribeach

Is your Dataiku connection name called "pg"?

0 Kudos
radiantly
Level 2
Author

Thank you for your reply. You are correct to point out that my database 'connection' name is not
'pg'. I have rerun the notebook with the appropriate connection name but get the same error.
It seem like the issue is with the "activityId" parameter in the Pyhton API, though not sure
what it is.

Please note that I am able to run the query in the SQL Notebook to get the desired output.
Besides, I have run the same query in the SQL Script and was able to create the new table
in the database. I have also checked the consistent use of the ${projectKey} for table names.

Thank you.

 

Capture01.PNG

0 Kudos
Turribeach

This is because the exec_recipe_fragment() method is not meant to be executed in a Notebook but in a Recipe. So run your code in a Python Recipe and it will work fine. The whole point of the exec_recipe_fragment() method is that it pushes down the execution of the query to the database, the data never leaves your database. Therefore even if you could run it in a Notebook there will be nothing to inspect as there is no dataframe with the data in memory. If you want to inspect the data in a Notebook you can use the query_to_df() method, there is an example shown in the link you posted. This method, as its name implies, copies the output of SQL execution to a dataframe which can be inspected in a Notebook. Of course query_to_df() will be slower than using exec_recipe_fragment() as it will need load all the resulting data in memory and may even fail if you don't have enough memory to hold all your data in memory. And then it has the additional step of having to write the data back to the database (see below). That's why exec_recipe_fragment() is preferred for performance reasons. 

Finally the query_to_df() example shown in the link you posted is incomplete as you can also write the df back to the output dataset to complete your recipe (should you still prefer to have the option to inspect the data in a Notebook). This is a very common pattern when you use Python recipes and you manipulate the data using Pandas dataframes. If you add the following lines below the query_to_df() line to execute your query it will write the resulting dataframe back to the dataset output (assuming the dataset output is called "NYC_trips_zones"):

 

# Write recipe outputs
NYC_trips_zones = dataiku.Dataset("NYC_trips_zones")
NYC_trips_zones.write_with_schema(df)