Fuzzy Match

mageshkumar
Level 1
Fuzzy Match

i need to do fuzzy match  based on jaro distance .I have two columns (X, Y). I have two unique values in the Y column.The fuzzy match need to take the shortest string from the X column and it should compare with another Y column's X values.likewise it need to do for all the X column values. If it satisfied the predefined threshold value, the values should be stored in the output with the match score and match key.Is there any available way to do this in dataiku.

0 Kudos
4 Replies
mageshkumar
Level 1
Author

I have already tried the Dataiku fuzzy join.  But its not matching with this use case.  I'm adding the input and output data screenshots for more clarification.  I need to get the exact output which I shared below for the given input data.

0 Kudos
Turribeach

Hi, I am afraid I don't see any screenshots in your post. Also there are 4 different algorithms in the fuzzy join so you would need to see which one is the closest to your use case. Did you try them all? If neither of these 4 algorithms don't match your requirements you will need to do this in a Python recipe and come up with your own algorithm.

 
0 Kudos
Turribeach

Rather than posting your data can you post how you configured the algorithms in the fuzzy join?

0 Kudos