Running multiple SQL statements within a Python recipe to return a dataframe
Hi,
Here is some SQL our internal Python recipe creates dynamically as it boots to check of the database, schema and table have proper permissions before performing a function.
show grants on database MYDB; create or replace temporary table db_check as select EXISTS( select * from table(result_scan(-1)) where "privilege" = 'USAGE' and "granted_on" = 'DATABASE' and "grantee_name" = 'MYROLE' ) as db_ok ; show grants on schema MYDB.BYSCHEMA; create or replace temporary table schema_check as select EXISTS( select * from table(result_scan(-1)) where "privilege" = 'USAGE' and "granted_on" = 'SCHEMA' and "grantee_name" = 'MYROLE' ) as schema_ok ; show grants on table MYDB.MYSCHEMA.MYTABLE; create or replace temporary table table_check as select EXISTS( select * from table(result_scan(-1)) where "privilege" = 'SELECT' and "granted_on" = 'TABLE' and "grantee_name" = 'MYROLE' ) as table_ok ; with combined as ( select distinct db_ok, schema_ok, table_ok, (db_ok and schema_ok and table_ok) as rest_api_ok from db_check, schema_check, table_check ) select * from combined
We typically do this sort of thing with the Snowflake Python connector which uses external browser authentication when running on users' computers or using RSA keys. On Dataiku we cannot launch an external browser and are reluctant to setup the infrastructure to handle per user keys. So, we are left with the dynamic Snowflake connection we setup.
My question is if it is possible to execute such a SQL script from Python using the SQLExecutor2? Both of the methods exec_recipe_fragment and query_to_df take a single SQL statement. Both accept pre-queries argument but, from my experience, these pre_queries are run in independent Snowflake sessions, unable to retain state. Is this true? If so, is there any other workaround?
Operating system used: Windows 10
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
Try adding the post_queries=['COMMIT'] to your query_to_df call.
https://developer.dataiku.com/latest/concepts-and-examples/sql.html#queries-with-side-effects
-
@Turribeach
Did that and got an error:Actual statement count 7 did not match the desired statement count 1
code is:
from dataiku.core.sql import SQLExecutor2 dku_connector = SQLExecutor2(connection='MYCONNECTION') sql = """show grants on database MYDB; create or replace temporary table db_check as select EXISTS( select * from table(result_scan(-1)) where "privilege" = 'USAGE' and "granted_on" = 'DATABASE' and "grantee_name" = 'MYROLE' ) as db_ok ; show grants on schema MYDB.MYSCHEMA; create or replace temporary table schema_check as select EXISTS( select * from table(result_scan(-1)) where "privilege" = 'USAGE' and "granted_on" = 'SCHEMA' and "grantee_name" = 'MYROLE' ) as schema_ok ; show grants on table MYDB.MYSCHEMA.MYTABLE; create or replace temporary table table_check as select EXISTS( select * from table(result_scan(-1)) where "privilege" = 'SELECT' and "granted_on" = 'TABLE' and "grantee_name" = 'MYROLE' ) as table_ok ; with combined as ( select distinct db_ok, schema_ok, table_ok, (db_ok and schema_ok and table_ok) as rest_api_ok from db_check, schema_check, table_check ) select * from combined;""" df = dku_connector.query_to_df(sql, post_queries=['COMMIT'])
-
Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 321 Neuron
Hi @info-rchitect
,Try passing all of the statements that create the temp tables in the pre_queries argument (with each statement an item in a list). The final query would go in the first argument to the query_to_df method.
Marlan
-
@Marlan
Thanks for the pre-queries, I have used this in the past for independent SQL statements. My experience shows these pre-queries are executed in independent Snowflake sessions, resulting in no state retention so not luck trying to query any created tables in the final SQL statement. -
Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 321 Neuron
Hi @info-rchitect
,Hmm... that hasn't been our experience. But you may have different set up than we do (we also use Snowflake). We have no problem accessing tables created via pre_queries in the final select query.
For example, the following snippet returns 97:
executor = SQLExecutor2(connection='SNOWFLAKE_DATABASE_CONNECTION') query = "SELECT MAX(C1) AS C1_MAX FROM TMP_97" pre_queries = "CREATE TEMP TABLE TMP_97 AS SELECT 97 AS C1" df = executor.query_to_df(query, pre_queries=[pre_queries]) print(df.iat[0, 0])
In this case, a commit in post_queries isn't needed because the created table is temporary anyway.
Marlan
-
Hi Marlan,
OK so I tried with pre-queries using my example and it worked. I went back to my previous failues using pre-queries and they were related to very long executing pre-statements and were on v11. I wonder if there is some timeout for each pre-query that I was exceeeding previously. Thanks for the help!
regards