We're excited to announce that we're launching the second installment of Dataiku Product Days Register Now

Join using not exists

Solved!
Ankur5289
Level 3
Level 3
Join using not exists

Hello everyone,

i have a two data sets and using the join recipe. I want the dataset to be populated as such :

keep only those records which are available in only in dataset1 (say left hand side) and not available in dataset 2.

I used the join condition of the two datasets' common column using "join on" as is different . But the operation is running long time . Yes its a huge dataset with 500k records.

Is there any fastest way to get such kind of results?

0 Kudos
1 Solution
Ankur5289
Level 3
Level 3
Author

Hello @Manuel  Thanks for this . let me try . do you have any predefined formulae to be used in this prepare recipe?

View solution in original post

0 Kudos
4 Replies
Manuel
Dataiker
Dataiker

Why not do a left Join on (dataset1, dataset2) followed by a Prepare recipe that deletes all records with data on a column originally from dataset2?

0 Kudos
Ankur5289
Level 3
Level 3
Author

Hello @Manuel  Thanks for this . let me try . do you have any predefined formulae to be used in this prepare recipe?

View solution in original post

0 Kudos
Ankur5289
Level 3
Level 3
Author

Great, this worked,

0 Kudos
Manuel
Dataiker
Dataiker

Great

0 Kudos
A banner prompting to get Dataiku DSS