Sync or SQL recipe
Hi there,
I am confused about using sync recipe and if a SQL recipe would be better!
So, I get a snowflake dataset in my flow, I see some people do the sync recipe without any partitioning or modifications then query the data to get the output.
But If I have the chance to get the same output by using a SQL query recipe directly, what is the point of using sync before SQL recipe? are there advantages of this practice?
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,212 Dataiker
Hi @MTALY
,SQL engine is recommended where possible. Input and output datasets will both need to be in the SQL database. For SQL recipes, most visual recipes, and all prepare recipes are made of SQL-translatable processors. It's best to stick with the SQL engine.
Even unloading using fast-path to cloud storage is usually slower than just using the SQL engine.
https://knowledge.dataiku.com/latest/kb/data-prep/where-compute-happens.html
For Snowflake, there is a native spark integration as well: https://doc.dataiku.com/dss/latest/connecting/sql/snowflake.html#id.
Not sure about the real reason for using the Sync recipe there. Perhaps was to avoid creating intermediate datasets in Snowflake. Using SQL pipelines instead would usually be a better option and intermediate datasets are virtualized: https://doc.dataiku.com/dss/latest/sql/pipelines/sql_pipelines.html
So really the only case you would need to Sync from Snowflake -> to Cloud Storage would be when you have a read-only Snowflake connection.