Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

SQL Execution Engine for recipes that have a SQL Query Dataset as input?

Neuron
Neuron
SQL Execution Engine for recipes that have a SQL Query Dataset as input?

I noticed if I load a dataset with "SQL query" mode in DSS, and then try to run a simple prepare recipe on that dataset (just drop a single column), the SQL Execution Engine is no longer available (In memory only) for the prepare recipe? I also tried selecting "virtualizable in build"  under the Advanced tab but that didn't seem to help

Note: dropping a single column is not my intended usage, I just wanted to verify that an operation that should be supported by the SQL Engine is in fact not. 

 
 
 

 

0 Kudos
6 Replies
Dataiker
Dataiker

Hi @tim-wright ,

 

When you are creating the "prepare recipe", where does the data reside? In Relational Databases for both inputs and outputs?

If your output data will be in a non-RDBMS, you would not be able to run visual recipes in RDBMS. If your input data is already in RDBMS and when you create a prepare recipe which drops a column, DSS will generate an SQL script which can directly run on the existing dataset in RDBMS thus you are able to run in-db (SQL). However, if your output data will be in a file for example, then the input data will be moved to DSS and perform computation in DSS, thus you will see in-memory.

Also, please note that prepare recipes have some operations that are not SQL compatible, and then you may not be able to run it in in-db(SQL).  You would be able to see if the prepare operation is SQL compatible or not by the green circle in the picture.

Screenshot 2020-10-06 at 9.52.17 AM.png

Hope this helps!

Judy

0 Kudos
Neuron
Neuron
Author

@Snoopy The data is within an RDBMS - in fact it is an SQL Query dataset. When you click "add dataset" in DSS, you can choose a SQL data source (I'm using Snowflake). Within the dataset creation window there are 2 modes  "read a database table", which is the preferred method I think, or "sql query". I used the second (sql query) to create my dataset. This works just fine. See the image below:

sql_query dataset.PNG

 

However if I try to run a subsequent Prepare recipe on this dataset I cannot seem to use the SQL Execution engine even though I am certain the recipe should be compatible with the SQL Engine. It is a little unclear to me what the limitations of using a "sql query" for the dataset instead of referencing a table directly is/are. 

prepare_recipe.PNG

 

0 Kudos
Dataiker
Dataiker

It’s the expected behavior. Think that the visual recipes expect to have a table identifier and build a query around that. In all generality, it would be impossible to wrap your query (from the SQL query dataset) in a subquery for the visual recipe as it might be syntactically incorrect (for example the SQL query can contain “;” or CTA).

For this reason, in particular, it is not advised to use SQL query datasets. Prefer using a table and a SQL query recipe.

If you can’t, maybe you can create a view in your SQL database and then use a normal “table” dataset (DSS does not care if the table is an actual table or a view, the SQL syntax is the same).

 

Neuron
Neuron
Author

@Snoopy I was hoping the former (subquery construction) would have been possible. I had considered creating the view within Snowflake myself, but liked the idea of having the view logic (simple enough as it may be) embedded in the flow itself. Thanks for the Answer!

Wow, I was completely ignorant about this behavior for datasets created with SQL queries! Thanks for answer @Snoopy, and for the question @tim-wright 

Neuron
Neuron
Author

@Ignacio_Toledo 👍 Glad my ignorance on the topic could help you as well.