Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello everyone,
i have a two data sets and using the join recipe. I want the dataset to be populated as such :
keep only those records which are available in only in dataset1 (say left hand side) and not available in dataset 2.
I used the join condition of the two datasets' common column using "join on" as is different . But the operation is running long time . Yes its a huge dataset with 500k records.
Is there any fastest way to get such kind of results?
Hello @Manuel Thanks for this . let me try . do you have any predefined formulae to be used in this prepare recipe?
Why not do a left Join on (dataset1, dataset2) followed by a Prepare recipe that deletes all records with data on a column originally from dataset2?
Hello @Manuel Thanks for this . let me try . do you have any predefined formulae to be used in this prepare recipe?
Great, this worked,
Great
Hi @Ankur5289 ,
We've recently released a new version of Dataiku, 11.3, and you can now add additional output datasets to a Join recipe that contain all the unmatched rows from either a left, right, or inner join. This will let you create a dataset with only the rows that, for example, are in the left dataset and not in the right one.
Anti-joins/exclusion joins are on our roadmap.
Cheers,
Ashley