Submit your inspiring success story or innovative use case to the 2022 Dataiku Frontrunner Awards! ENTER YOUR SUBMISSION

Outer Join with same columns

BeachBoy
Level 1
Outer Join with same columns

I am trying to combine two datasets and keep the same columns as output. Is there a way to do this without join + recipe?

Dataset 1: Previous Data

  • Key
  • Value

Dataset 2: New & Updated Data

  • Key
  • Value

I want to join the two, keep every row in Dataset 1 and update data if there is a matching key in Dataset 2, I also want to add every line from Dataset 2 that has no match in Dataset 1, both key and value.

Left Join will not add new lines from Dataset 2, Right Join will not keep non updated lines from Dataset 1. Outer Join gets me all the lines, but then I have to add a recipe to add compute column to check each line if key is empty in one set or the other and copy.

I was able to do that check and extra column directly from the join, by adding post join computed columns, but have not found how to drop the original selected columns and keep only my calculated columns, in order to skip the prepare step.

Can it be done directly in SQL? This is the SQL from the join:

SELECT 
    "d1_Key" AS "d1_Key",
    "d1_Value" AS "d1_Value",
    "d2_Key" AS "d2_Key",
    "d2_Value" AS "d2_Value",
    CASE WHEN "d1_Key" IS NULL THEN "d2_Key" ELSE "d1_Key" END AS "Key",
    CASE WHEN "d2_Value" IS NULL THEN "d1_Value" ELSE "d2_Value" END AS "Value"
  FROM (
    SELECT 
        "Dataset 1"."Key" AS "d1_Key",
        "Dataset 1"."Value" AS "d1_Value",
        "Dataset 2"."Key" AS "d2_Key",
        "Dataset 2"."Value" AS "d2_Value",
      FROM "schema"."table1" "Dataset 1"
      FULL OUTER JOIN "schema"."table2" "Dataset 2"
        ON ("Dataset 1"."Key" = "Dataset 2"."Key")
    ) "withoutcomputedcols_query"

 Thanks!

0 Kudos
0 Replies