Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello,
Using the latest version of Dataiku, I performed a fuzzy join on two datasets, with two conditions (see screenshot below).
I'm encountering an issue: somes values are joined when it looks like they shouldn't be. The distance in the second condition (Geo) is above the threshold, but the join is still performed (isMatch = true, see JSON below)
Am I missing something?
Thanks a lot for the help
Best,
Thibault
Fuzzy join info
Join meta info
[{"pairs":[{"dataset":"dataset1_data_final","joinKey":"raisonSociale","joinValue":"ferme ruelles"},{"dataset":"dataset2_data_final","joinKey":"company_name","joinValue":"ferme corbie"}],"isMatch":true,"distance":3.1,"threshold":6.0,"distanceType":"LEVENSHTEIN"},{"pairs":[{"dataset":"dataset1_data_final","joinKey":"GeoPoint","joinValue":"POINT(1.518615 49.140605)"},{"dataset":"dataset2_data_final","joinKey":"GeoPoint","joinValue":"POINT(1.546123 49.16188)"}],"isMatch":true,"distance":6.0,"threshold":4.0,"distanceType":"GEO"}]
Operating system used: Docker on Linux