Advanced Designer Learning Path is now live! Read More

Join that REMOVES matching rows

Level 1
Join that REMOVES matching rows
Hi, I have a dataset where some combination of columns results in a unique key:

1 A

1 B

2 B

2 C

3 A

The first column with numerical values represents users and the second column with characters are their transactions. Each user has a unique identifier, but it is repeated in the file for every transaction.

I am trying to create a dataset where all users that purchased a certain item, for instance A, are completely removed. If I just delete the rows containing A (e.g. using a Prepare recipe), User 1 is still in the dataset via purchase B. I had a thought that this could be accomplished by identifying the list of unique users who had purchased A, and then using that list of users as a filter on the original dataset. I am not sure of the best way to accomplish this -- it seemed like the simplest way might be to do something like a left join, based on the filtered list, but set the condition that the output dataset REMOVES the rows that match.

Is this possible, or is there a better method? Or can this only be accomplished using SQL?
0 Kudos
2 Replies


What you describe is often described as the "ANTI JOIN" concept in SQL.  I would recommend indeed to accomplish this using a SQL script recipe in Dataiku.

Unfortunately, there is not a common syntax for ANTI JOINs in SQL, as for regular LEFT/RIGHT/INNER/... JOINS. Depending on the flavour of SQL database you are using, you can look up the specific syntax you need. For instance, in PostgreSQL you can use a syntax like this:

FROM actor a
SELECT * FROM film_actor fa
WHERE a.actor_id = fa.actor_id

This is documented online in this article: For other SQL flavours, similar articles certainly exist to help you write the right script.

Hope it helps,


0 Kudos
Level 1
Thank you, Alex.
0 Kudos
Labels (2)
A banner prompting to get Dataiku DSS