How to output to / update my snowflake table using Dataiku

Options
abalo006
abalo006 Registered Posts: 29

I have a snowflake table and I've set up the connection and everything looks good, Dataiku requires me to create a dataset using that snowflake table that I can use as my input / output. The issue is I have that dataset as my output and when I run my flow, I can see my results, but it isn't actually outputting to my snowflake table directly.

After running my flow, my snowflake table is still empty, I thought the whole point of creating a connection using a data table was to be able to read / write to that table?

Am I understanding this wrong? is there any way I can set up my flow so that my results are outputting to my snowflake table / connection and actually writing to the table?


Operating system used: windows

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,716 Neuron
    Options

    This should work. Please post screen shots of your flow showing the inout dataset, the recipe and the output dataset and also from the recipe's Input/Output tab.

  • abalo006
    abalo006 Registered Posts: 29
    Options

    So first I selected my connected and table name (my table is currently empty) and I created a dataset from that

    CONNECTION.PNG

    Then I added a blank prepare step to my flow so that I could output to that new dataset I just made

    CONFIG.PNG

    I'm using a text input then a python step, before outputting to my snowflake dataset I ensured my column names / data types all matched my snowflake table / dataset

    FLOW.PNG

    When trying to run I get an SQL error saying that table already exists

    ERROR.PNG

    and when I explore that final output dataset it remains empty

    OUTPUT.PNG

    A bit stuck on where to go from here, I also find it strange that I don't see any option to choose between overwriting and appending to my output, the goal is to populate the table with historical data and then from then on, append the new data every time I run

    am I doing something wrong?

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,716 Neuron
    Options

    Yes I think you are doing something wrong. When you add a dataset via the Dataset menu this is called an "external dataset". These datasets are meant to be used for read only access, basically as inputs to recipes. You can't write to external datasets from visual recipes, you need to use a "managed dataset". To create a managed dataset go to your recipe input/output dataset section and create a NEW dataset in a Dataiku connection that has both the "Allow write" and "Allow managed datasets" options enabled. If the connection is not listed as an option then you don't have the options set or you don't have permissions to use the connection to write data to it.

  • abalo006
    abalo006 Registered Posts: 29
    Options

    ok that makes, so selected the dataset dropdown to view all dataset types, then selected "managed dataset"

    MANAGED.PNG

    then I created a new managed data set using my snowflake connection

    CONFIG.PNG

    then I've configured the managed data set using my connection / table so I think this might work

    MANAGE CONFIG.PNG

    but I don't see anything regarding "Allow write" and "Allow managed datasets" options enabled. Did I create the dataset the right way?

    I know you mentioned "go to your recipe input/output dataset section and create a NEW dataset" but wasn't sure what you meant by that so just wanted to confirm I'm doing this right.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,716 Neuron
    Options

    "A Dataiku connection that has both the "Allow write" and "Allow managed datasets" options enabled". The options are on the dataiku connection, not the dataset.

    Are you changing the table name after creating the dataset? You can't write to an existing table like this and even if you did Dataiku will drop the table if the dataset schema changes. A managed dataset means it's managed by Dataiku, it will create and drop the table as required.

  • abalo006
    abalo006 Registered Posts: 29
    Options

    Ok that makes sense, but can you confirm where I'm supposed to be seeing those connection options? do you have a screenshot or something I can use as comparison? That way I can know for sure

    To your second point, my schema / field names should be the same, maybe I wasn't clear in my initial post, I have a table that's already created by my snowflake admin, the goal was just to populate that table using the output of my Dataiku flow, that way I can use that table as an input for an external visualization.

    You just said I can write to an existing table, so do I have to change the name and create a new table, and then use that new table as an input for my viz instead? and the how does that work if my flow is scheduled to run daily? the goal is to schedule the flow to run daily, updating my table every day, then use that table as a live connection to my external report, that way every time the table updates, my report also updates, and im still confused about the option to append vs overwrite is

Setup Info
    Tags
      Help me…