Geo-Join Nearest
Hello, first time question poster here and new to Dataiku. I've replicate a flow that I created in another tool which uses geo-join in a prepare recipe to find the nearest location between 2 lists of locations with long & lat data points. When comparing the output of Dataiku with the output of the other tool, I am noticing some different matches on nearest location as well as some variations in the distance between the same 2 locations. In order to account for the differences, I'm seeking to understand the methodology of the Dataiku geo-join for nearest location, but I haven't been able to find any documentation or references to it. Does Dataiku use a linear/straight-line distance between 2 points to geo-join, shortest driving route, or some other approach?
Any insights into this would be much appreciated. Thanks!
Operating system used: Windows
Answers
-
importthepandas Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 115 Neuron
I believe they use a JTS wrapper and this method: https://docs.geotools.org/stable/javadocs/org/geotools/referencing/GeodeticCalculator.html
write up is here: https://blog.dataiku.com/building-the-geospatial-join-recipe-in-dataiku