How to output to / update my snowflake table using Dataiku
I have a snowflake table and I've set up the connection and everything looks good, Dataiku requires me to create a dataset using that snowflake table that I can use as my input / output. The issue is I have that dataset as my output and when I run my flow, I can see my results, but it isn't actually outputting to my snowflake table directly.
After running my flow, my snowflake table is still empty, I thought the whole point of creating a connection using a data table was to be able to read / write to that table?
Am I understanding this wrong? is there any way I can set up my flow so that my results are outputting to my snowflake table / connection and actually writing to the table?
Operating system used: windows
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,982 Neuron
This should work. Please post screen shots of your flow showing the inout dataset, the recipe and the output dataset and also from the recipe's Input/Output tab.
-
So first I selected my connected and table name (my table is currently empty) and I created a dataset from that
Then I added a blank prepare step to my flow so that I could output to that new dataset I just made
I'm using a text input then a python step, before outputting to my snowflake dataset I ensured my column names / data types all matched my snowflake table / dataset
When trying to run I get an SQL error saying that table already exists
and when I explore that final output dataset it remains empty
A bit stuck on where to go from here, I also find it strange that I don't see any option to choose between overwriting and appending to my output, the goal is to populate the table with historical data and then from then on, append the new data every time I run
am I doing something wrong?
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,982 Neuron
Yes I think you are doing something wrong. When you add a dataset via the Dataset menu this is called an "external dataset". These datasets are meant to be used for read only access, basically as inputs to recipes. You can't write to external datasets from visual recipes, you need to use a "managed dataset". To create a managed dataset go to your recipe input/output dataset section and create a NEW dataset in a Dataiku connection that has both the "Allow write" and "Allow managed datasets" options enabled. If the connection is not listed as an option then you don't have the options set or you don't have permissions to use the connection to write data to it.
-
ok that makes, so selected the dataset dropdown to view all dataset types, then selected "managed dataset"
then I created a new managed data set using my snowflake connection
then I've configured the managed data set using my connection / table so I think this might work
but I don't see anything regarding "Allow write" and "Allow managed datasets" options enabled. Did I create the dataset the right way?
I know you mentioned "go to your recipe input/output dataset section and create a NEW dataset" but wasn't sure what you meant by that so just wanted to confirm I'm doing this right.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,982 Neuron
"A Dataiku connection that has both the "Allow write" and "Allow managed datasets" options enabled". The options are on the dataiku connection, not the dataset.
Are you changing the table name after creating the dataset? You can't write to an existing table like this and even if you did Dataiku will drop the table if the dataset schema changes. A managed dataset means it's managed by Dataiku, it will create and drop the table as required.
-
Ok that makes sense, but can you confirm where I'm supposed to be seeing those connection options? do you have a screenshot or something I can use as comparison? That way I can know for sure
To your second point, my schema / field names should be the same, maybe I wasn't clear in my initial post, I have a table that's already created by my snowflake admin, the goal was just to populate that table using the output of my Dataiku flow, that way I can use that table as an input for an external visualization.
You just said I can write to an existing table, so do I have to change the name and create a new table, and then use that new table as an input for my viz instead? and the how does that work if my flow is scheduled to run daily? the goal is to schedule the flow to run daily, updating my table every day, then use that table as a live connection to my external report, that way every time the table updates, my report also updates, and im still confused about the option to append vs overwrite is