Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on July 9, 2021 9:14AM
Likes: 0
Replies: 5
Hello everyone,
i have a two data sets and using the join recipe. I want the dataset to be populated as such :
keep only those records which are available in only in dataset1 (say left hand side) and not available in dataset 2.
I used the join condition of the two datasets' common column using "join on" as is different . But the operation is running long time . Yes its a huge dataset with 500k records.
Is there any fastest way to get such kind of results?
Why not do a left Join on (dataset1, dataset2) followed by a Prepare recipe that deletes all records with data on a column originally from dataset2?
Great, this worked,
Great
Hi @Ankur5289
,
We've recently released a new version of Dataiku, 11.3, and you can now add additional output datasets to a Join recipe that contain all the unmatched rows from either a left, right, or inner join. This will let you create a dataset with only the rows that, for example, are in the left dataset and not in the right one.
Anti-joins/exclusion joins are on our roadmap.
Cheers,
Ashley