How do I check unjoined left / right records?

Solved!
XiShi
Level 1
How do I check unjoined left / right records?

New to dataiku & running into trouble trying to find a way to get the unjoined records. 

Something similar to L&R anchor in an Alteryx join tool (https://help.alteryx.com/20223/designer/join-tool)

 

0 Kudos
1 Solution
AshleyW
Dataiker

Hi @XiShi ,

Great question! You can get unjoined rows (also called an anti-join) using the Join recipe. Here's how:

  • Let's assume you have two datasets--customers (left) and transactions (right) --and you want to find the customers who aren't in the transactions dataset (the left antijoin / unjoined left rows) because they haven't bought anything recently, for example.
  • Use a Join recipe and set it up with a left join using a common key, likely customer_id. customers will be your left dataset and transactions will be your right dataset.
  • On the selected columns screen, keep the customer_id column from both datasets and rename them so you can tell them apart: customer_id_customer (left) and customer_id_transactions (right). You'll be setting up a filter using the values in the two key columns.
  • Apply a post-filter to keep only the rows where customer_id_transactions is empty. This will give you rows from customers, the left dataset, that don't have a matching record in transactions, the right dataset: the left unjoined rows or left anti-join. The result is our dataset of customers with no transactions.

Similarly, you could also opt for a full outer join like @tgb417 suggested and follow it with a split recipe to create three datasets, each corresponding to one section of the join venn diagram:

  • left key not empty and right key empty: left anti-join
  • left and right key not empty: inner join
  • left key empty and right key not empty: right anti-join

Cheers,

Ashley W.

View solution in original post

0 Kudos
4 Replies
tgb417

@XiShi 

Welcome to the Dataiku Community,  We are glad to have you with us.

If you are talking about using a visual recipe to do a Join,  There are a broad set of joins you can do including traditional SQL type Joins, Fuzzy Joins, and Geo Joins.

In a visual Join Recipe here is the list of types of joins you can do.  From your brief description is sounds like you want to do an outer join so you have all records from left data set only , joined data sets, and right dataset only.

List of visual Recipe Joins Outher, Left, Inner, Right, Cross and CustomList of visual Recipe Joins Outher, Left, Inner, Right, Cross and Custom

 

There are different visual recipes for windowing, grouping, pivoting and other choices.

If none of these work for you and you can write a bit of SQL you can create a SQL recipe to make whatever kind of join you underlying database infrastructure can support.  Not all databases support all kinds of join.

Hope this helps a bit.

--Tom
0 Kudos
AshleyW
Dataiker

Hi @XiShi ,

Great question! You can get unjoined rows (also called an anti-join) using the Join recipe. Here's how:

  • Let's assume you have two datasets--customers (left) and transactions (right) --and you want to find the customers who aren't in the transactions dataset (the left antijoin / unjoined left rows) because they haven't bought anything recently, for example.
  • Use a Join recipe and set it up with a left join using a common key, likely customer_id. customers will be your left dataset and transactions will be your right dataset.
  • On the selected columns screen, keep the customer_id column from both datasets and rename them so you can tell them apart: customer_id_customer (left) and customer_id_transactions (right). You'll be setting up a filter using the values in the two key columns.
  • Apply a post-filter to keep only the rows where customer_id_transactions is empty. This will give you rows from customers, the left dataset, that don't have a matching record in transactions, the right dataset: the left unjoined rows or left anti-join. The result is our dataset of customers with no transactions.

Similarly, you could also opt for a full outer join like @tgb417 suggested and follow it with a split recipe to create three datasets, each corresponding to one section of the join venn diagram:

  • left key not empty and right key empty: left anti-join
  • left and right key not empty: inner join
  • left key empty and right key not empty: right anti-join

Cheers,

Ashley W.

0 Kudos
XiShi
Level 1
Author

Thank you @AshleyW & @tgb417 for the suggestions!

The left join & post filter works well, and unfortunately my corporate account is on an engine that has the outer join option greyed out, I have raised an internal ticket to see how I can resolve that.

0 Kudos
AshleyW
Dataiker

Hi @XiShi ,

We've recently released a new version of Dataiku, 11.3, and you can now add additional output datasets to a Join recipe that contains all the unmatched rows from either a left, right, or inner join.

Cheers,

Ashley

0 Kudos