Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

How to output to / update my snowflake table using Dataiku

abalo006
Level 3
How to output to / update my snowflake table using Dataiku

I have a snowflake table and I've set up the connection and everything looks good, Dataiku requires me to create a dataset using that snowflake table that I can use as my input / output. The issue is I have that dataset as my output and when I run my flow, I can see my results, but it isn't actually outputting to my snowflake table directly.

After running my flow, my snowflake table is still empty, I thought the whole point of creating a connection using a data table was to be able to read / write to that table?

Am I understanding this wrong? is there any way I can set up my flow so that my results are outputting to my snowflake table / connection and actually writing to the table?


Operating system used: windows

0 Kudos
6 Replies
Turribeach

This should work. Please post screen shots of your flow showing the inout dataset, the recipe and the output dataset and also from the recipe's Input/Output tab.

0 Kudos
abalo006
Level 3
Author

So first I selected my connected and table name (my table is currently empty) and I created a dataset from that

CONNECTION.PNG

 

 

Then I added a blank prepare step to my flow so that I could output to that new dataset I just made

CONFIG.PNG

 

I'm using a text input then a python step, before outputting to my snowflake dataset I ensured my column names / data types all matched my snowflake table / dataset

FLOW.PNG

 

When trying to run I get an SQL error saying that table already exists

ERROR.PNG

 

and when I explore that final output dataset it remains empty

OUTPUT.PNG

 

A bit stuck on where to go from here, I also find it strange that I don't see any option to choose between overwriting and appending to my output, the goal is to populate the table with historical data and then from then on, append the new data every time I run

 

am I doing something wrong?

0 Kudos

Yes I think you are doing something wrong. When you add a dataset via the Dataset menu this is called an "external dataset". These datasets are meant to be used for read only access, basically as inputs to recipes. You can't write to external datasets from visual recipes, you need to use a "managed dataset". To create a managed dataset go to your recipe input/output dataset section and create a NEW dataset in a Dataiku connection that has both the "Allow write" and "Allow managed datasets" options enabled. If the connection is not listed as an option then you don't have the options set or you don't have permissions to use the connection to write data to it. 

0 Kudos
abalo006
Level 3
Author

 

ok that makes, so selected the dataset dropdown to view all dataset types, then selected "managed dataset"

MANAGED.PNG

 

then I created a new managed data set using my snowflake connection

CONFIG.PNG

 

then I've configured the managed data set using my connection / table so I think this might work

MANAGE CONFIG.PNG

but I don't see anything regarding "Allow write" and "Allow managed datasets" options enabled. Did I create the dataset the right way?

 

I know you mentioned "go to your recipe input/output dataset section and create a NEW dataset" but wasn't sure what you meant by that so just wanted to confirm I'm doing this right.

0 Kudos

"A Dataiku connection that has both the "Allow write" and "Allow managed datasets" options enabled". The options are on the dataiku connection, not the dataset.

Are you changing the table name after creating the dataset? You can't write to an existing table like this and even if you did Dataiku will drop the table if the dataset schema changes. A managed dataset means it's managed by Dataiku, it will create and drop the table as required. 

0 Kudos
abalo006
Level 3
Author

Ok that makes sense, but can you confirm where I'm supposed to be seeing those connection options? do you have a screenshot or something I can use as comparison? That way I can know for sure

To your second point, my schema / field names should be the same, maybe I wasn't clear in my initial post, I have a table that's already created by my snowflake admin, the goal was just to populate that table using the output of my Dataiku flow, that way I can use that table as an input for an external visualization.

You just said I can write to an existing table, so do I have to change the name and create a new table, and then use that new table as an input for my viz instead? and the how does that work if my flow is scheduled to run daily? the goal is to schedule the flow to run daily, updating my table every day, then use that table as a live connection to my external report, that way every time the table updates, my report also updates, and im still confused about the option to append vs overwrite is

0 Kudos