Fuzzy join matching results above threshold

thibault-boyeux
thibault-boyeux Registered Posts: 1 ✭✭✭
edited July 16 in Using Dataiku

Hello,

Using the latest version of Dataiku, I performed a fuzzy join on two datasets, with two conditions (see screenshot below).

I'm encountering an issue: somes values are joined when it looks like they shouldn't be. The distance in the second condition (Geo) is above the threshold, but the join is still performed (isMatch = true, see JSON below)

Am I missing something?

Thanks a lot for the help

Best,

Thibault

Fuzzy join info

dataiku.png

Join meta info

[{"pairs":[{"dataset":"dataset1_data_final","joinKey":"raisonSociale","joinValue":"ferme ruelles"},{"dataset":"dataset2_data_final","joinKey":"company_name","joinValue":"ferme corbie"}],"isMatch":true,"distance":3.1,"threshold":6.0,"distanceType":"LEVENSHTEIN"},{"pairs":[{"dataset":"dataset1_data_final","joinKey":"GeoPoint","joinValue":"POINT(1.518615 49.140605)"},{"dataset":"dataset2_data_final","joinKey":"GeoPoint","joinValue":"POINT(1.546123 49.16188)"}],"isMatch":true,"distance":6.0,"threshold":4.0,"distanceType":"GEO"}]


Operating system used: Docker on Linux

Setup Info
    Tags
      Help me…