Try your hand at analyzing royal sentiment in Dataiku DSS! Learn more

how to write multiple sql recipes data to common output table in Redshift

Level 3
how to write multiple sql recipes data to common output table in Redshift

Hi

I am reading and transforming data from multiple Redshift tables using multiple SQL Recipes in mutiple flows in same project and want to append each output to a common table but once the output is selected in one flow

DSS is not allowing to choose same output dataset in another flow even though the  output structure of all the SQL recipes are same. how can I make this possible

 

8 Replies
Dataiker
Dataiker

Hi,

From the description you provide, I recommend using a "Stack" recipe from your multiple "SQL recipes data" into one common output table.

If that doesn't work in your specific case, could you please add a screenshot of your flow?

Hope it helps,

Alex

 

Level 3
Author

Hi Alex

 

Thanks for replying, please find my use case below. All im trying to do is build an audit table (DSS_AUDIT_TABLE)to collect  (record_count,data_load_date) for each source table.

Below flow is what im trying to achieve.

TABLE_1 --> SQL Recipe -->DSS_AUDIT_TABLE

TABLE_2--->SQL Recipe--->DSS_AUDIT_TABLE

 

currently below is the flow with a separate table for each source table collecting same information.

basically im trying to do an insert into a common table

TABLE_1 ---> SQL Recipe --->DSS_AUDIT_TABLE_1

TABLE_2--->SQL Recipe ----> DSS_AUDIT_TABLE_2

Dataiker
Dataiker

Hi,

 The Stack recipe solution I proposed earlier should work and respect the Directed Acyclic Graph nature of the flow:

Screenshot 2020-05-14 at 12.22.30.png

An alternative would be to use the SQL recipe in "script" variant: https://doc.dataiku.com/dss/latest/code_recipes/sql.htm. You would declare a dummy table as output but actually write to a central audit table which is not declared as output. I don't recommend this solution since it breaks the flow lineage and means you'll need to manage table create/drop statement yourself.

Hope it helps,

Alex

Level 3
Author

Hi Alex

 

Thanks for replying but here input _table 1 loads on a different day of week (say Monday)

and input_table 2 loads on say Wednesday, Do I still need to stack them up to load into a common table?

will it not union the latest data of one table and previous /old load data of another table  during every load

and feed the common table?

Dataiker
Dataiker

Hi,

If you want to do incremental loading, you will need to partition your input datasets by day. In a nutshell, you would run the stack recipe every day for the partition of 'TODAY'. It will stack input datasets only if data has been loaded today.

I invite you to read the following doc: https://doc.dataiku.com/dss/latest/partitions/index.html

You can also join the upcoming webinar on partitions: https://community.dataiku.com/t5/Online-Events/Partitioning-in-Dataiku-DSS-May-20th/ba-p/6148

Hope it helps,

Alex

Level 3
Author

Dear All

 

When i am using stack recipe im seeing invalid recipe message near RUN button(using SQL engine Redshift)

This issue I'm seeing when I'm trying to add more than 6 input datasets for the recipe. Till 6 inputs it is ok

is there any limit on no of input datasets for stack recipe. I have 20 + tables to union and load into one table

0 Kudos
Dataiker
Dataiker

Hi,

This is not expected, there are normally no limits enforced on the number of datasets to stack. I will need more information to diagnose what's going on.

First, could you please send us a screenshot with the specific error?

Cheers,

Alex

0 Kudos
Level 3

Hello,

Another option would be to create a separate dataset for each flow and then in the Settings for each dataset, change the name of the table to a one common name.  So you'd have the following datasets DSS_AUDIT_TABLE_1, DSS_AUDIT_TABLE_2, and DSS_AUDIT_TABLE_3 but the SQL table name would be DSS_AUDIT_TABLE for all 3 datasets.  Then you could use a SQL Code "Script" recipe in each flow to insert records into that same table .  You'd need to create the table in advance of course and set the Output Dataset to Append for each flow.

Marlan