Discover all of the brand-new features and improvements to existing capabilities in the Dataiku 11.3 updateLET'S GO

Sync or SQL recipe

MTALY
Level 1
Sync or SQL recipe

Hi there,

I am confused about using sync recipe and if a SQL recipe would be better!

So, I get a snowflake dataset in my flow, I see some people do the sync recipe without any partitioning or modifications then query the data to get the output.

But If I have the chance to get the same output by using a SQL query recipe directly,  what is the point of using sync before SQL recipe? are there advantages of this practice? 

0 Kudos
1 Reply
AlexT
Dataiker

Hi @MTALY,

SQL engine is recommended where possible.  Input and output datasets will both need to be in the SQL database. For SQL recipes, most visual recipes, and all prepare recipes are made of SQL-translatable processors. It's best to stick with the SQL engine. 

Even unloading using fast-path to cloud storage is usually slower than just using the SQL engine. 

https://knowledge.dataiku.com/latest/kb/data-prep/where-compute-happens.html

 For  Snowflake, there is a native spark integration as well: https://doc.dataiku.com/dss/latest/connecting/sql/snowflake.html#id

Not sure about the real reason for using the Sync recipe there. Perhaps was to avoid creating intermediate datasets in Snowflake. Using SQL pipelines instead would usually be a better option and intermediate datasets are virtualized: https://doc.dataiku.com/dss/latest/sql/pipelines/sql_pipelines.html

So really the only case you would need to Sync from Snowflake -> to Cloud Storage would be when you have a read-only Snowflake connection. 

 

0 Kudos