Running multiple SQL statements within a Python recipe to return a dataframe

Options
info-rchitect
info-rchitect Registered Posts: 169 ✭✭✭✭✭✭
edited July 16 in Using Dataiku

Hi,

Here is some SQL our internal Python recipe creates dynamically as it boots to check of the database, schema and table have proper permissions before performing a function.

show grants on database MYDB;
create or replace temporary table db_check as
    select EXISTS(
        select * from table(result_scan(-1)) where "privilege" = 'USAGE' and "granted_on" = 'DATABASE' and "grantee_name" = 'MYROLE'
    ) as db_ok
;
show grants on schema MYDB.BYSCHEMA;
create or replace temporary table schema_check as
    select EXISTS(
        select * from table(result_scan(-1)) where "privilege" = 'USAGE' and "granted_on" = 'SCHEMA' and "grantee_name" = 'MYROLE'
    ) as schema_ok
;
show grants on table MYDB.MYSCHEMA.MYTABLE;
create or replace temporary table table_check as
    select EXISTS(
        select * from table(result_scan(-1)) where "privilege" = 'SELECT' and "granted_on" = 'TABLE' and "grantee_name" = 'MYROLE'
    ) as table_ok
;
with combined as (
    select distinct db_ok, schema_ok, table_ok, (db_ok and schema_ok and table_ok) as rest_api_ok
    from db_check, schema_check, table_check
)
select * from combined

We typically do this sort of thing with the Snowflake Python connector which uses external browser authentication when running on users' computers or using RSA keys. On Dataiku we cannot launch an external browser and are reluctant to setup the infrastructure to handle per user keys. So, we are left with the dynamic Snowflake connection we setup.

My question is if it is possible to execute such a SQL script from Python using the SQLExecutor2? Both of the methods exec_recipe_fragment and query_to_df take a single SQL statement. Both accept pre-queries argument but, from my experience, these pre_queries are run in independent Snowflake sessions, unable to retain state. Is this true? If so, is there any other workaround?


Operating system used: Windows 10

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,737 Neuron
    Options
  • info-rchitect
    info-rchitect Registered Posts: 169 ✭✭✭✭✭✭
    edited July 17
    Options

    @Turribeach
    Did that and got an error:

    Actual statement count 7 did not match the desired statement count 1

    code is:

    from dataiku.core.sql import SQLExecutor2
    dku_connector = SQLExecutor2(connection='MYCONNECTION')
    
    sql = """show grants on database MYDB;
    create or replace temporary table db_check as
        select EXISTS(
            select * from table(result_scan(-1)) where "privilege" = 'USAGE' and "granted_on" = 'DATABASE' and "grantee_name" = 'MYROLE'
        ) as db_ok
    ;
    show grants on schema MYDB.MYSCHEMA;
    create or replace temporary table schema_check as
        select EXISTS(
            select * from table(result_scan(-1)) where "privilege" = 'USAGE' and "granted_on" = 'SCHEMA' and "grantee_name" = 'MYROLE'
        ) as schema_ok
    ;
    show grants on table MYDB.MYSCHEMA.MYTABLE;
    create or replace temporary table table_check as
        select EXISTS(
            select * from table(result_scan(-1)) where "privilege" = 'SELECT' and "granted_on" = 'TABLE' and "grantee_name" = 'MYROLE'
        ) as table_ok
    ;
    with combined as (
        select distinct db_ok, schema_ok, table_ok, (db_ok and schema_ok and table_ok) as rest_api_ok
        from db_check, schema_check, table_check
    )
    select * from combined;"""
    
    df = dku_connector.query_to_df(sql, post_queries=['COMMIT'])

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 317 Neuron
    Options

    Hi @info-rchitect
    ,

    Try passing all of the statements that create the temp tables in the pre_queries argument (with each statement an item in a list). The final query would go in the first argument to the query_to_df method.

    Marlan

  • info-rchitect
    info-rchitect Registered Posts: 169 ✭✭✭✭✭✭
    Options

    @Marlan
    Thanks for the pre-queries, I have used this in the past for independent SQL statements. My experience shows these pre-queries are executed in independent Snowflake sessions, resulting in no state retention so not luck trying to query any created tables in the final SQL statement.

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 317 Neuron
    edited July 17
    Options

    Hi @info-rchitect
    ,

    Hmm... that hasn't been our experience. But you may have different set up than we do (we also use Snowflake). We have no problem accessing tables created via pre_queries in the final select query.

    For example, the following snippet returns 97:

    executor = SQLExecutor2(connection='SNOWFLAKE_DATABASE_CONNECTION')
    query =  "SELECT MAX(C1) AS C1_MAX FROM TMP_97"
    pre_queries = "CREATE TEMP TABLE TMP_97 AS SELECT 97 AS C1"
    df = executor.query_to_df(query, pre_queries=[pre_queries])
    print(df.iat[0, 0])

    In this case, a commit in post_queries isn't needed because the created table is temporary anyway.

    Marlan

  • info-rchitect
    info-rchitect Registered Posts: 169 ✭✭✭✭✭✭
    Options

    Hi Marlan,

    OK so I tried with pre-queries using my example and it worked. I went back to my previous failues using pre-queries and they were related to very long executing pre-statements and were on v11. I wonder if there is some timeout for each pre-query that I was exceeeding previously. Thanks for the help!

    regards

Setup Info
    Tags
      Help me…