Fuzzy join matching results above threshold

thibault-boyeux
Level 1
Fuzzy join matching results above threshold

Hello,

Using the latest version of Dataiku, I performed a fuzzy join on two datasets, with two conditions (see screenshot below).

I'm encountering an issue: somes values are joined when it looks like they shouldn't be.  The distance in the second condition (Geo) is above the threshold, but the join is still performed (isMatch = true, see JSON below)

Am I missing something?

Thanks a lot for the help

Best,

Thibault

 

Fuzzy join info

dataiku.png
 

Join meta info

 

 

[{"pairs":[{"dataset":"dataset1_data_final","joinKey":"raisonSociale","joinValue":"ferme ruelles"},{"dataset":"dataset2_data_final","joinKey":"company_name","joinValue":"ferme corbie"}],"isMatch":true,"distance":3.1,"threshold":6.0,"distanceType":"LEVENSHTEIN"},{"pairs":[{"dataset":"dataset1_data_final","joinKey":"GeoPoint","joinValue":"POINT(1.518615 49.140605)"},{"dataset":"dataset2_data_final","joinKey":"GeoPoint","joinValue":"POINT(1.546123 49.16188)"}],"isMatch":true,"distance":6.0,"threshold":4.0,"distanceType":"GEO"}]

 

 


Operating system used: Docker on Linux

 

0 Replies