Outer Join with same columns
I am trying to combine two datasets and keep the same columns as output. Is there a way to do this without join + recipe?
Dataset 1: Previous Data
- Key
- Value
Dataset 2: New & Updated Data
- Key
- Value
I want to join the two, keep every row in Dataset 1 and update data if there is a matching key in Dataset 2, I also want to add every line from Dataset 2 that has no match in Dataset 1, both key and value.
Left Join will not add new lines from Dataset 2, Right Join will not keep non updated lines from Dataset 1. Outer Join gets me all the lines, but then I have to add a recipe to add compute column to check each line if key is empty in one set or the other and copy.
I was able to do that check and extra column directly from the join, by adding post join computed columns, but have not found how to drop the original selected columns and keep only my calculated columns, in order to skip the prepare step.
Can it be done directly in SQL? This is the SQL from the join:
SELECT "d1_Key" AS "d1_Key", "d1_Value" AS "d1_Value", "d2_Key" AS "d2_Key", "d2_Value" AS "d2_Value", CASE WHEN "d1_Key" IS NULL THEN "d2_Key" ELSE "d1_Key" END AS "Key", CASE WHEN "d2_Value" IS NULL THEN "d1_Value" ELSE "d2_Value" END AS "Value" FROM ( SELECT "Dataset 1"."Key" AS "d1_Key", "Dataset 1"."Value" AS "d1_Value", "Dataset 2"."Key" AS "d2_Key", "Dataset 2"."Value" AS "d2_Value", FROM "schema"."table1" "Dataset 1" FULL OUTER JOIN "schema"."table2" "Dataset 2" ON ("Dataset 1"."Key" = "Dataset 2"."Key") ) "withoutcomputedcols_query"
Thanks!
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,125 Neuron
Of course it can be done in SQL. Use a SQL recipe instead of a Visual Join recipe then you have the SQL you want.