how to write multiple sql recipes data to common output table in Redshift

rsingamsetty
rsingamsetty Dataiku DSS Core Designer, Dataiku DSS & SQL, Registered Posts: 18 ✭✭✭✭✭

Hi

I am reading and transforming data from multiple Redshift tables using multiple SQL Recipes in mutiple flows in same project and want to append each output to a common table but once the output is selected in one flow

DSS is not allowing to choose same output dataset in another flow even though the output structure of all the SQL recipes are same. how can I make this possible

Answers

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭

    Hi,

    From the description you provide, I recommend using a "Stack" recipe from your multiple "SQL recipes data" into one common output table.

    If that doesn't work in your specific case, could you please add a screenshot of your flow?

    Hope it helps,

    Alex

  • rsingamsetty
    rsingamsetty Dataiku DSS Core Designer, Dataiku DSS & SQL, Registered Posts: 18 ✭✭✭✭✭

    Hi Alex

    Thanks for replying, please find my use case below. All im trying to do is build an audit table (DSS_AUDIT_TABLE)to collect (record_count,data_load_date) for each source table.

    Below flow is what im trying to achieve.

    TABLE_1 --> SQL Recipe -->DSS_AUDIT_TABLE

    TABLE_2--->SQL Recipe--->DSS_AUDIT_TABLE

    currently below is the flow with a separate table for each source table collecting same information.

    basically im trying to do an insert into a common table

    TABLE_1 ---> SQL Recipe --->DSS_AUDIT_TABLE_1

    TABLE_2--->SQL Recipe ----> DSS_AUDIT_TABLE_2

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭

    Hi,

    The Stack recipe solution I proposed earlier should work and respect the Directed Acyclic Graph nature of the flow:

    Screenshot 2020-05-14 at 12.22.30.png

    An alternative would be to use the SQL recipe in "script" variant: https://doc.dataiku.com/dss/latest/code_recipes/sql.htm. You would declare a dummy table as output but actually write to a central audit table which is not declared as output. I don't recommend this solution since it breaks the flow lineage and means you'll need to manage table create/drop statement yourself.

    Hope it helps,

    Alex

  • rsingamsetty
    rsingamsetty Dataiku DSS Core Designer, Dataiku DSS & SQL, Registered Posts: 18 ✭✭✭✭✭

    Hi Alex

    Thanks for replying but here input _table 1 loads on a different day of week (say Monday)

    and input_table 2 loads on say Wednesday, Do I still need to stack them up to load into a common table?

    will it not union the latest data of one table and previous /old load data of another table during every load

    and feed the common table?

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭

    Hi,

    If you want to do incremental loading, you will need to partition your input datasets by day. In a nutshell, you would run the stack recipe every day for the partition of 'TODAY'. It will stack input datasets only if data has been loaded today.

    I invite you to read the following doc: https://doc.dataiku.com/dss/latest/partitions/index.html

    You can also join the upcoming webinar on partitions: https://community.dataiku.com/t5/Online-Events/Partitioning-in-Dataiku-DSS-May-20th/ba-p/6148

    Hope it helps,

    Alex

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 319 Neuron

    Hello,

    Another option would be to create a separate dataset for each flow and then in the Settings for each dataset, change the name of the table to a one common name. So you'd have the following datasets DSS_AUDIT_TABLE_1, DSS_AUDIT_TABLE_2, and DSS_AUDIT_TABLE_3 but the SQL table name would be DSS_AUDIT_TABLE for all 3 datasets. Then you could use a SQL Code "Script" recipe in each flow to insert records into that same table . You'd need to create the table in advance of course and set the Output Dataset to Append for each flow.

    Marlan

  • rsingamsetty
    rsingamsetty Dataiku DSS Core Designer, Dataiku DSS & SQL, Registered Posts: 18 ✭✭✭✭✭

    Dear All

    When i am using stack recipe im seeing invalid recipe message near RUN button(using SQL engine Redshift)

    This issue I'm seeing when I'm trying to add more than 6 input datasets for the recipe. Till 6 inputs it is ok

    is there any limit on no of input datasets for stack recipe. I have 20 + tables to union and load into one table

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭

    Hi,

    This is not expected, there are normally no limits enforced on the number of datasets to stack. I will need more information to diagnose what's going on.

    First, could you please send us a screenshot with the specific error?

    Cheers,

    Alex

Setup Info
    Tags
      Help me…