Fuzzy join matching results above threshold
thibault-boyeux
Registered Posts: 1 ✭✭✭
Hello,
Using the latest version of Dataiku, I performed a fuzzy join on two datasets, with two conditions (see screenshot below).
I'm encountering an issue: somes values are joined when it looks like they shouldn't be. The distance in the second condition (Geo) is above the threshold, but the join is still performed (isMatch = true, see JSON below)
Am I missing something?
Thanks a lot for the help
Best,
Thibault
Fuzzy join info
Join meta info
[{"pairs":[{"dataset":"dataset1_data_final","joinKey":"raisonSociale","joinValue":"ferme ruelles"},{"dataset":"dataset2_data_final","joinKey":"company_name","joinValue":"ferme corbie"}],"isMatch":true,"distance":3.1,"threshold":6.0,"distanceType":"LEVENSHTEIN"},{"pairs":[{"dataset":"dataset1_data_final","joinKey":"GeoPoint","joinValue":"POINT(1.518615 49.140605)"},{"dataset":"dataset2_data_final","joinKey":"GeoPoint","joinValue":"POINT(1.546123 49.16188)"}],"isMatch":true,"distance":6.0,"threshold":4.0,"distanceType":"GEO"}]
Operating system used: Docker on Linux