Cannot deduplicate join matches

DavidALI Registered Posts: 18 ✭✭✭✭✭

Hi ,

I am trying to use dataiku to make bank reconciliation

I have a dataset with my invoices (invoices) and a dataset with my bank statement (bank_statement).
invoices contains a column with the amount (invoices.amount) and and sent_date (invoices.sent_date)
bank_statement contains a column with the amount ( bank_statement.amount) and one with the payment_date ( bank_statement.payment_date)
I tried to join every row of the invoices dataset with the one on the bank_statement wich has the same amount and the nearest date

I tried a join recipe (left join) of invoices with bank_statement
with all the following conditions
invoices.amount = bank_statement.amount : match all value within range of 1 (because of possible rounding on one side or another)
invoices.sent_date = bank_statement.payment_date : match the nearest date , force single match in case of equality ( because one bank row can only be corresponding to one invoice), maximum difference for a match = 10 days
invoices.sent_date <= bank_statement.payment_date ( because the payment is always made after the invoice or on the same date)

But I have an error message : "cannot deduplicate join matches"

I browsed this forum ( and the web) but I did not found anything about that.

Tank you for your precious help


Best Answer

  • Keiji
    Keiji Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
    Answer ✓

    Hello @DavidALI

    Thank you so much for the post on Dataiku Community.

    As the DSS execution engine does not support the deduplication of join matches, you will need to use other execution engines such as "In-database (SQL)" and "Spark" as follows.

    Screen Shot 2022-01-19 at 16.44.13.png

    Keiji, Dataiku Technical Support


Setup Info
      Help me…