The 'exec_recipe_fragment' method of 'SQLExecutor2' class fails to perform

radiantly
radiantly Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 6

Hello, I was exploring the following code example:

https://developer.dataiku.com/latest/tutorials/data-engineering/sql-in-code/index.html#code-recipe-examples

But I got an error from the exec_recipe_fragment method (attached below). I tried both options of the overwrite_output_schema parameter but did not succeed. Any help to resolve this error would be great .

Using a postgres database as instructed in the example. Dataiku enterprise version 11.4.3.

Thanks.

Capture02.PNG

Capture03.PNG

Best Answer

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,024 Neuron
    edited July 17 Answer ✓

    This is because the exec_recipe_fragment() method is not meant to be executed in a Notebook but in a Recipe. So run your code in a Python Recipe and it will work fine. The whole point of the exec_recipe_fragment() method is that it pushes down the execution of the query to the database, the data never leaves your database. Therefore even if you could run it in a Notebook there will be nothing to inspect as there is no dataframe with the data in memory. If you want to inspect the data in a Notebook you can use the query_to_df() method, there is an example shown in the link you posted. This method, as its name implies, copies the output of SQL execution to a dataframe which can be inspected in a Notebook. Of course query_to_df() will be slower than using exec_recipe_fragment() as it will need load all the resulting data in memory and may even fail if you don't have enough memory to hold all your data in memory. And then it has the additional step of having to write the data back to the database (see below). That's why exec_recipe_fragment() is preferred for performance reasons.

    Finally the query_to_df() example shown in the link you posted is incomplete as you can also write the df back to the output dataset to complete your recipe (should you still prefer to have the option to inspect the data in a Notebook). This is a very common pattern when you use Python recipes and you manipulate the data using Pandas dataframes. If you add the following lines below the query_to_df() line to execute your query it will write the resulting dataframe back to the dataset output (assuming the dataset output is called "NYC_trips_zones"):

    # Write recipe outputs
    NYC_trips_zones = dataiku.Dataset("NYC_trips_zones")
    NYC_trips_zones.write_with_schema(df)

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,024 Neuron

    Is your Dataiku connection name called "pg"?

  • radiantly
    radiantly Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 6

    Thank you for your reply. You are correct to point out that my database 'connection' name is not
    'pg'. I have rerun the notebook with the appropriate connection name but get the same error.
    It seem like the issue is with the "activityId" parameter in the Pyhton API, though not sure
    what it is.

    Please note that I am able to run the query in the SQL Notebook to get the desired output.
    Besides, I have run the same query in the SQL Script and was able to create the new table
    in the database. I have also checked the consistent use of the ${projectKey} for table names.

    Thank you.

    Capture01.PNG

  • Vividha
    Vividha Registered Posts: 1 ✭✭

    @Turribeach Even if exec_recipe_fragment() method won't execute in notebook would the database still get created and it has to be through recipe only? I am currently working on a pipeline where I need the exec_recipe_fragment() command to preprocess the raw data which I can't do using query_to_df due to its sheer volume. Is there a way to use that command in notebook?

Setup Info
    Tags
      Help me…