SQL Execution Engine for recipes that have a SQL Query Dataset as input?
I noticed if I load a dataset with "SQL query" mode in DSS, and then try to run a simple prepare recipe on that dataset (just drop a single column), the SQL Execution Engine is no longer available (In memory only) for the prepare recipe? I also tried selecting "virtualizable in build" under the Advanced tab but that didn't seem to help
Note: dropping a single column is not my intended usage, I just wanted to verify that an operation that should be supported by the SQL Engine is in fact not.
Best Answer
-
It’s the expected behavior. Think that the visual recipes expect to have a table identifier and build a query around that. In all generality, it would be impossible to wrap your query (from the SQL query dataset) in a subquery for the visual recipe as it might be syntactically incorrect (for example the SQL query can contain “;” or CTA).
For this reason, in particular, it is not advised to use SQL query datasets. Prefer using a table and a SQL query recipe.
If you can’t, maybe you can create a view in your SQL database and then use a normal “table” dataset (DSS does not care if the table is an actual table or a view, the SQL syntax is the same).
Answers
-
Hi @tim-wright
,When you are creating the "prepare recipe", where does the data reside? In Relational Databases for both inputs and outputs?
If your output data will be in a non-RDBMS, you would not be able to run visual recipes in RDBMS. If your input data is already in RDBMS and when you create a prepare recipe which drops a column, DSS will generate an SQL script which can directly run on the existing dataset in RDBMS thus you are able to run in-db (SQL). However, if your output data will be in a file for example, then the input data will be moved to DSS and perform computation in DSS, thus you will see in-memory.
Also, please note that prepare recipes have some operations that are not SQL compatible, and then you may not be able to run it in in-db(SQL). You would be able to see if the prepare operation is SQL compatible or not by the green circle in the picture.
Hope this helps!
Judy
-
tim-wright Partner, L2 Designer, Snowflake Advanced, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 77 Partner
@Snoopy
The data is within an RDBMS - in fact it is an SQL Query dataset. When you click "add dataset" in DSS, you can choose a SQL data source (I'm using Snowflake). Within the dataset creation window there are 2 modes "read a database table", which is the preferred method I think, or "sql query". I used the second (sql query) to create my dataset. This works just fine. See the image below:However if I try to run a subsequent Prepare recipe on this dataset I cannot seem to use the SQL Execution engine even though I am certain the recipe should be compatible with the SQL Engine. It is a little unclear to me what the limitations of using a "sql query" for the dataset instead of referencing a table directly is/are.
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Wow, I was completely ignorant about this behavior for datasets created with SQL queries! Thanks for answer @Snoopy
, and for the question @tim-wright -
tim-wright Partner, L2 Designer, Snowflake Advanced, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 77 Partner
@Snoopy
I was hoping the former (subquery construction) would have been possible. I had considered creating the view within Snowflake myself, but liked the idea of having the view logic (simple enough as it may be) embedded in the flow itself. Thanks for the Answer! -
tim-wright Partner, L2 Designer, Snowflake Advanced, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 77 Partner
@Ignacio_Toledo
Glad my ignorance on the topic could help you as well.