Fuzzy Match
i need to do fuzzy match based on jaro distance .I have two columns (X, Y). I have two unique values in the Y column.The fuzzy match need to take the shortest string from the X column and it should compare with another Y column's X values.likewise it need to do for all the X column values. If it satisfied the predefined threshold value, the values should be stored in the output with the match score and match key.Is there any available way to do this in dataiku.
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,982 Neuron
-
Magesh kumar Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 2 ✭✭✭
I have already tried the Dataiku fuzzy join. But its not matching with this use case. I'm adding the input and output data screenshots for more clarification. I need to get the exact output which I shared below for the given input data.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,982 Neuron
Hi, I am afraid I don't see any screenshots in your post. Also there are 4 different algorithms in the fuzzy join so you would need to see which one is the closest to your use case. Did you try them all? If neither of these 4 algorithms don't match your requirements you will need to do this in a Python recipe and come up with your own algorithm.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,982 Neuron
Rather than posting your data can you post how you configured the algorithms in the fuzzy join?