What formula is used for the fuzzy values clustering in the prepare recipe on DSS?

JCB
Level 2
What formula is used for the fuzzy values clustering in the prepare recipe on DSS?

According to: https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/How-to-standardize-text-fields-usin...

You can choose a clustering strategy of โ€œFuzzyโ€ or โ€œHighly fuzzyโ€ to cluster and merge similar text in the dataset. What is this fuzzy matching based on? Damerauโ€“Levenshtein? If so, what threshold? It would be great to understand the logic behind the fuzzy clustering before applying the prepare recipe step.

Thanks in advance.


Operating system used: windows

0 Kudos
2 Replies
CoreyS
Dataiker Alumni

Hi @JCB while you wait for a more complete response, here are some resources you may find helpful:

I hope this helps!

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as โ€˜Accepted Solutionโ€™ to help others like you!
0 Kudos
rakesh2
Level 1

Hi, Is there any further update on this question? I am also interested to know the logic behind the fuzzy values clustering .

 

0 Kudos