Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

SQLExecutor 2 - query_to_df use

Level 3
SQLExecutor 2 - query_to_df use

Hello,

I would like to know how to use the method query_to_df with the following parameters :

query_to_df(query, pre_queries=None, post_queries=None, extra_conf={}, infer_from_schema=False, parse_dates=True, bool_as_str=False, dtypes=None, script_steps=None, script_input_schema=None, script_output_schema=None)

For instance, what is the use of pre_queries, post_queries,extra_conf, script_steps ...?

Some examples would be welcome to understand how to use it in the best way.

I noticed that each execution generates a DB connection which may takes several seconds each time. Is there a way to avoid this systematic connection opening ? To have it only the first time we execute the first query on the same DSS DB connection ... and the other queries are just executed with the same DB connection.

Annie

0 Kudos
3 Replies
Neuron
Neuron

Hi @rona,

The results of query are what populate the dataframe. This must be a single select statement. pre_queries is a list of queries or statements that are executed before the main query is executed. For example, you could create a temporary table via pre_queries and use it in the main query. post_queries is likewise a list of queries or statements that would be executed after the main query.

I commonly set infer_from_schema to true so that the resulting dataframe types are set per the SQL query datatypes rather than being inferred by pandas.

I've always left other arguments at their default values.

Also, see slide deck and example project at this link:

https://community.dataiku.com/t5/Online-Events/Using-the-Dataiku-DSS-Python-API-for-Interfacing-with...

The slide deck describes use of query_to_df and the project includes examples of use.

Marlan

Level 3
Author

Hello Marlan,

Thanks a lot for these very interesting information ! It helps me to see how we can use the SQLExecutor API.

Do you know if there is a way to use the same DB connection by two consecutive calls to the query_to_df method ? to avoid the time required to open the DB connection ?

Best regards,

Annie

0 Kudos
Neuron
Neuron

Hi Annie (@rona),

I don't know of a way to tell DSS to reuse the connection for multiple calls to query_to_df. Possible that the connection is cached but sounds like that is not your experience. Hasn't been an issue for me so haven't paid much attention. 

Depending on your need, you could package multiple queries and union the results or something like that. But doesn't work if you need the results of a query prior to the next query which is more likely. 

Good luck!

Marlan 

0 Kudos