I am reading and transforming data from multiple Redshift tables using multiple SQL Recipes in mutiple flows in same project and want to append each output to a common table but once the output is selected in one flow
DSS is not allowing to choose same output dataset in another flow even though the output structure of all the SQL recipes are same. how can I make this possible
From the description you provide, I recommend using a "Stack" recipe from your multiple "SQL recipes data" into one common output table.
If that doesn't work in your specific case, could you please add a screenshot of your flow?
Hope it helps,
Thanks for replying, please find my use case below. All im trying to do is build an audit table (DSS_AUDIT_TABLE)to collect (record_count,data_load_date) for each source table.
Below flow is what im trying to achieve.
TABLE_1 --> SQL Recipe -->DSS_AUDIT_TABLE
currently below is the flow with a separate table for each source table collecting same information.
basically im trying to do an insert into a common table
TABLE_1 ---> SQL Recipe --->DSS_AUDIT_TABLE_1
TABLE_2--->SQL Recipe ----> DSS_AUDIT_TABLE_2
The Stack recipe solution I proposed earlier should work and respect the Directed Acyclic Graph nature of the flow:
An alternative would be to use the SQL recipe in "script" variant: https://doc.dataiku.com/dss/latest/code_recipes/sql.htm. You would declare a dummy table as output but actually write to a central audit table which is not declared as output. I don't recommend this solution since it breaks the flow lineage and means you'll need to manage table create/drop statement yourself.
Hope it helps,
Thanks for replying but here input _table 1 loads on a different day of week (say Monday)
and input_table 2 loads on say Wednesday, Do I still need to stack them up to load into a common table?
will it not union the latest data of one table and previous /old load data of another table during every load
and feed the common table?
If you want to do incremental loading, you will need to partition your input datasets by day. In a nutshell, you would run the stack recipe every day for the partition of 'TODAY'. It will stack input datasets only if data has been loaded today.
I invite you to read the following doc: https://doc.dataiku.com/dss/latest/partitions/index.html
You can also join the upcoming webinar on partitions: https://community.dataiku.com/t5/Online-Events/Partitioning-in-Dataiku-DSS-May-20th/ba-p/6148
Hope it helps,
When i am using stack recipe im seeing invalid recipe message near RUN button(using SQL engine Redshift)
This issue I'm seeing when I'm trying to add more than 6 input datasets for the recipe. Till 6 inputs it is ok
is there any limit on no of input datasets for stack recipe. I have 20 + tables to union and load into one table
This is not expected, there are normally no limits enforced on the number of datasets to stack. I will need more information to diagnose what's going on.
First, could you please send us a screenshot with the specific error?
Another option would be to create a separate dataset for each flow and then in the Settings for each dataset, change the name of the table to a one common name. So you'd have the following datasets DSS_AUDIT_TABLE_1, DSS_AUDIT_TABLE_2, and DSS_AUDIT_TABLE_3 but the SQL table name would be DSS_AUDIT_TABLE for all 3 datasets. Then you could use a SQL Code "Script" recipe in each flow to insert records into that same table . You'd need to create the table in advance of course and set the Output Dataset to Append for each flow.