To compare two datasets and find out the difference

Options
pranauv
pranauv Registered Posts: 5 ✭✭✭

Hi All,

I am trying to create a project flow where I am having two datasets let us say
Dataset 1:
Name id
xxxx user1
yyyy user2

Dataset 2:
Name id
aaaa user3
xxxx user1
yyyy user2
zzzz user4

In both the datasets the id column is unique, I need to compare both the datasets and my output dataset should have
output dataset:
Name id
zzzz user4
aaaa user3

Which means I need to compare the dataset1 with dataset2 and if any of the users in dataset1 is not available in dataset2 I need to take them and put it as an output dataset.

Similarly I want the vice versa of the scenario where I also want an output dataset which compare the dataset1 with dataset2 and if any of the users in dataset1 is available in dataset2 I need to take them and put it as an output dataset.

Output dataset :
Name id
xxxx user1
yyyy user2

Can you help me to achieve this?

Answers

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Options

    Hi,

    you can do a left outer join of dataset 2 (on the left) with dataset 1 (on the right), then have a post-filter in the Join recipe to only keep rows where right_id (or whatever name you give that column) is not defined.

    If the id are unique, then an inner join will give only rows for which there is a match. You can have an inner join by changing the join type in the modal of the join

    Screenshot 2021-05-19 at 12.27.53.png

Setup Info
    Tags
      Help me…