Fuzzy join matching results above threshold

Level 1
Fuzzy join matching results above threshold


Using the latest version of Dataiku, I performed a fuzzy join on two datasets, with two conditions (see screenshot below).

I'm encountering an issue: somes values are joined when it looks like they shouldn't be.  The distance in the second condition (Geo) is above the threshold, but the join is still performed (isMatch = true, see JSON below)

Am I missing something?

Thanks a lot for the help




Fuzzy join info


Join meta info



[{"pairs":[{"dataset":"dataset1_data_final","joinKey":"raisonSociale","joinValue":"ferme ruelles"},{"dataset":"dataset2_data_final","joinKey":"company_name","joinValue":"ferme corbie"}],"isMatch":true,"distance":3.1,"threshold":6.0,"distanceType":"LEVENSHTEIN"},{"pairs":[{"dataset":"dataset1_data_final","joinKey":"GeoPoint","joinValue":"POINT(1.518615 49.140605)"},{"dataset":"dataset2_data_final","joinKey":"GeoPoint","joinValue":"POINT(1.546123 49.16188)"}],"isMatch":true,"distance":6.0,"threshold":4.0,"distanceType":"GEO"}]



Operating system used: Docker on Linux


0 Replies