Join using not exists

Options
Ankur5289
Ankur5289 Partner, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered Posts: 27 Partner

Hello everyone,

i have a two data sets and using the join recipe. I want the dataset to be populated as such :

keep only those records which are available in only in dataset1 (say left hand side) and not available in dataset 2.

I used the join condition of the two datasets' common column using "join on" as is different . But the operation is running long time . Yes its a huge dataset with 500k records.

Is there any fastest way to get such kind of results?

Best Answer

  • Ankur5289
    Ankur5289 Partner, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered Posts: 27 Partner
    Answer ✓
    Options

    Hello @Manuel
    Thanks for this . let me try . do you have any predefined formulae to be used in this prepare recipe?

Answers

  • Manuel
    Manuel Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 193 ✭✭✭✭✭✭✭
    Options

    Why not do a left Join on (dataset1, dataset2) followed by a Prepare recipe that deletes all records with data on a column originally from dataset2?

  • Ankur5289
    Ankur5289 Partner, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered Posts: 27 Partner
    Options

    Great, this worked,

  • Manuel
    Manuel Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 193 ✭✭✭✭✭✭✭
    Options
  • AshleyW
    AshleyW Dataiker, Alpha Tester, Dataiku DSS Core Designer, Registered, Product Ideas Manager Posts: 161 Dataiker
    Options

    Hi @Ankur5289
    ,

    We've recently released a new version of Dataiku, 11.3, and you can now add additional output datasets to a Join recipe that contain all the unmatched rows from either a left, right, or inner join. This will let you create a dataset with only the rows that, for example, are in the left dataset and not in the right one.

    Anti-joins/exclusion joins are on our roadmap.

    Cheers,

    Ashley

Setup Info
    Tags
      Help me…