SQLExecutor 2 - query_to_df use

rona
rona Registered Posts: 52 ✭✭✭✭✭

Hello,

I would like to know how to use the method query_to_df with the following parameters :

query_to_df(query, pre_queries=None, post_queries=None, extra_conf={}, infer_from_schema=False, parse_dates=True, bool_as_str=False, dtypes=None, script_steps=None, script_input_schema=None, script_output_schema=None)

For instance, what is the use of pre_queries, post_queries,extra_conf, script_steps ...?

Some examples would be welcome to understand how to use it in the best way.

I noticed that each execution generates a DB connection which may takes several seconds each time. Is there a way to avoid this systematic connection opening ? To have it only the first time we execute the first query on the same DSS DB connection ... and the other queries are just executed with the same DB connection.

Annie

Best Answer

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 319 Neuron
    Answer ✓

    Hi Annie (@rona
    ),

    I don't know of a way to tell DSS to reuse the connection for multiple calls to query_to_df. Possible that the connection is cached but sounds like that is not your experience. Hasn't been an issue for me so haven't paid much attention.

    Depending on your need, you could package multiple queries and union the results or something like that. But doesn't work if you need the results of a query prior to the next query which is more likely.

    Good luck!

    Marlan

Answers

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 319 Neuron

    Hi @rona
    ,

    The results of query are what populate the dataframe. This must be a single select statement. pre_queries is a list of queries or statements that are executed before the main query is executed. For example, you could create a temporary table via pre_queries and use it in the main query. post_queries is likewise a list of queries or statements that would be executed after the main query.

    I commonly set infer_from_schema to true so that the resulting dataframe types are set per the SQL query datatypes rather than being inferred by pandas.

    I've always left other arguments at their default values.

    Also, see slide deck and example project at this link:

    https://community.dataiku.com/t5/Online-Events/Using-the-Dataiku-DSS-Python-API-for-Interfacing-with-SQL/ba-p/7797

    The slide deck describes use of query_to_df and the project includes examples of use.

    Marlan

  • rona
    rona Registered Posts: 52 ✭✭✭✭✭

    Hello Marlan,

    Thanks a lot for these very interesting information ! It helps me to see how we can use the SQLExecutor API.

    Do you know if there is a way to use the same DB connection by two consecutive calls to the query_to_df method ? to avoid the time required to open the DB connection ?

    Best regards,

    Annie

Setup Info
    Tags
      Help me…