Running multiple SQL statements within a Python recipe to return a dataframe

info-rchitect
Level 6
Running multiple SQL statements within a Python recipe to return a dataframe

Hi,

Here is some SQL our internal Python recipe creates dynamically as it boots to check of the database, schema and table have proper permissions before performing a function.

 

show grants on database MYDB;
create or replace temporary table db_check as
    select EXISTS(
        select * from table(result_scan(-1)) where "privilege" = 'USAGE' and "granted_on" = 'DATABASE' and "grantee_name" = 'MYROLE'
    ) as db_ok
;
show grants on schema MYDB.BYSCHEMA;
create or replace temporary table schema_check as
    select EXISTS(
        select * from table(result_scan(-1)) where "privilege" = 'USAGE' and "granted_on" = 'SCHEMA' and "grantee_name" = 'MYROLE'
    ) as schema_ok
;
show grants on table MYDB.MYSCHEMA.MYTABLE;
create or replace temporary table table_check as
    select EXISTS(
        select * from table(result_scan(-1)) where "privilege" = 'SELECT' and "granted_on" = 'TABLE' and "grantee_name" = 'MYROLE'
    ) as table_ok
;
with combined as (
    select distinct db_ok, schema_ok, table_ok, (db_ok and schema_ok and table_ok) as rest_api_ok
    from db_check, schema_check, table_check
)
select * from combined

 

We typically do this sort of thing with the Snowflake Python connector which uses external browser authentication when running on users' computers or using RSA keys.  On Dataiku we cannot launch an external browser and are reluctant to setup the infrastructure to handle per user keys.  So, we are left with the dynamic Snowflake connection we setup.

My question is if it is possible to execute such a SQL script from Python using the SQLExecutor2?  Both of the methods exec_recipe_fragment and query_to_df take a single SQL statement.  Both accept pre-queries argument but, from my experience, these pre_queries are run in independent Snowflake sessions, unable to retain state.  Is this true?  If so, is there any other workaround?


Operating system used: Windows 10

0 Kudos
6 Replies
Turribeach

Try adding the post_queries=['COMMIT'] to your query_to_df call.

https://developer.dataiku.com/latest/concepts-and-examples/sql.html#queries-with-side-effects

 

0 Kudos
info-rchitect
Level 6
Author

@Turribeach Did that and got an error:

Actual statement count 7 did not match the desired statement count 1

code is:

 

from dataiku.core.sql import SQLExecutor2
dku_connector = SQLExecutor2(connection='MYCONNECTION')

sql = """show grants on database MYDB;
create or replace temporary table db_check as
    select EXISTS(
        select * from table(result_scan(-1)) where "privilege" = 'USAGE' and "granted_on" = 'DATABASE' and "grantee_name" = 'MYROLE'
    ) as db_ok
;
show grants on schema MYDB.MYSCHEMA;
create or replace temporary table schema_check as
    select EXISTS(
        select * from table(result_scan(-1)) where "privilege" = 'USAGE' and "granted_on" = 'SCHEMA' and "grantee_name" = 'MYROLE'
    ) as schema_ok
;
show grants on table MYDB.MYSCHEMA.MYTABLE;
create or replace temporary table table_check as
    select EXISTS(
        select * from table(result_scan(-1)) where "privilege" = 'SELECT' and "granted_on" = 'TABLE' and "grantee_name" = 'MYROLE'
    ) as table_ok
;
with combined as (
    select distinct db_ok, schema_ok, table_ok, (db_ok and schema_ok and table_ok) as rest_api_ok
    from db_check, schema_check, table_check
)
select * from combined;"""

df = dku_connector.query_to_df(sql, post_queries=['COMMIT'])

 

0 Kudos
Marlan

Hi @info-rchitect,

Try passing all of the statements that create the temp tables in the pre_queries argument (with each statement an item in a list). The final query would go in the first argument to the query_to_df method.

Marlan

 

0 Kudos
info-rchitect
Level 6
Author

@Marlan Thanks for the pre-queries, I have used this in the past for independent SQL statements.  My experience shows these pre-queries are executed in independent Snowflake sessions, resulting in no state retention so not luck trying to query any created tables in the final SQL statement.

0 Kudos
Marlan

Hi @info-rchitect,

Hmm... that hasn't been our experience. But you may have different set up than we do (we also use Snowflake). We have no problem accessing tables created via pre_queries in the final select query.

For example, the following snippet returns 97:

executor = SQLExecutor2(connection='SNOWFLAKE_DATABASE_CONNECTION')
query =  "SELECT MAX(C1) AS C1_MAX FROM TMP_97"
pre_queries = "CREATE TEMP TABLE TMP_97 AS SELECT 97 AS C1"
df = executor.query_to_df(query, pre_queries=[pre_queries])
print(df.iat[0, 0])

In this case, a commit in post_queries isn't needed because the created table is temporary anyway.

Marlan

info-rchitect
Level 6
Author

Hi Marlan,

OK so I  tried with pre-queries using my example and it worked.  I went back to my previous failues using pre-queries and they were related to very long executing pre-statements and were on v11.  I wonder if there is some timeout for each pre-query that I was exceeeding previously.  Thanks for the help!

regards