Dataset storing problem in snowflake dataset, to change connection sql recipe exists which is linked

Manyam
Manyam Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 7 ✭✭✭

Hi, problem we are facing is,

Other teammates had built snowflake datasets and sql recipes. Currently their is a big problem whenever we run the flow in dataiku all snowflake datasets are storing in the snowflake. We need to remove that stored tables from the snowflake. So one way we understand is "change connection" method and it works. But there are some sql recipes using that snowflake datasets. I think ,for sql recipes snowflake datasets is needed. It was actually a big flow. There are about 40 to 50 sql recipes exists. We cannot change the connection which is in snowflake to S3 or filesystem_managed for the datasets, where sql recipes created as it is fetching data from snowflake dataset. All are exists in middle of the flow, so it is difficult to find any solution. Main issue here is storing the data in snowflake for snowflake datasets. Can anyone has any solutions to tackle it, please let me know. Thanks!

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,111 Neuron

    Hi, you are not explaining why do you need to move away from Snowflake. Whoever made the decision to move away from Snowflake should have been aware that this is not a trivial change. Even if you might be able to convert most visual recipes to use an S3 file based dataset you will not be able to do that for SQL recipes as you have found out. SQL recipes will need a SQL engine to run on. But even having another SQL engine to replace Snowflake with still pose a challenge to convert these recipes as other SQL engines like PostgreSQL, Microsoft SQL Server, Databricks, etc use different SQL language variants. For instance each of those SQL engines have their own set of SQL functions you can use. Each support different data types. Each support different rules for tables names, column names and schema names, etc, etc, etc. So even converting to another SQL engine could be a challenge depending on how complex your SQL recipes are. Is it possible? In most cases it certainly should be possible to convert to another SQL engine. Will it be easy? Highly unlikely.

    If you don’t have another SQL engine to replace Snowflake with you will to convert those recipes to other recipes types like Python code or other visual recipes. This is a full rewrite exercise and it will certainly be a hard task to complete.

    The final point to make is that you may be able to convert a complex flow such as yours to use only an S3 file based connection but will it be able to run it? Snowflake is a push down SQL engine meaning that the data always stays in the database. This allows Dataiku to offload the compute task to the Snowflake server which in turn can be scaled to handle large datasets. With an S3 file based connection all the work will have to be done by the Dataiku server and the data being moved in and out of S3 input and output datasets. It’s quite likely that you will suffer huge performance problems on your S3 flow depending on the volume of data and size of Dataiku server.

    Therefore it’s highly advisable you reconsider your request to remove Snowflake from your flow. At the very least you should plan to have a comparable engine replacement for Snowflake not just relaying in an S3 file based connection.

Setup Info
    Tags
      Help me…