Cannot deduplicate join matches

DavidALI · January 2022

Hi ,

I am trying to use dataiku to make bank reconciliation

I have a dataset with my invoices (invoices) and a dataset with my bank statement (bank_statement).
invoices contains a column with the amount (invoices.amount) and and sent_date (invoices.sent_date)
bank_statement contains a column with the amount ( bank_statement.amount) and one with the payment_date ( bank_statement.payment_date)
I tried to join every row of the invoices dataset with the one on the bank_statement wich has the same amount and the nearest date

I tried a join recipe (left join) of invoices with bank_statement
with all the following conditions
invoices.amount = bank_statement.amount : match all value within range of 1 (because of possible rounding on one side or another)
AND
invoices.sent_date = bank_statement.payment_date : match the nearest date , force single match in case of equality ( because one bank row can only be corresponding to one invoice), maximum difference for a match = 10 days
AND
invoices.sent_date <= bank_statement.payment_date ( because the payment is always made after the invoice or on the same date)

But I have an error message : "cannot deduplicate join matches"

I browsed this forum ( and the web) but I did not found anything about that.

Tank you for your precious help

David

Keiji · January 2022

Hello @DavidALI
,

Thank you so much for the post on Dataiku Community.

As the DSS execution engine does not support the deduplication of join matches, you will need to use other execution engines such as "In-database (SQL)" and "Spark" as follows.

Screen Shot 2022-01-19 at 16.44.13.png

Sincerely,
Keiji, Dataiku Technical Support

DavidALI · January 2022

Hello KeijiY,

Thank you for your very clear answer

Sincerely,

David

Cannot deduplicate join matches

Best Answer

Answers

Categories

Setup Info

Tags