What formula is used for the fuzzy values clustering in the prepare recipe on DSS?

JCB
Level 2
What formula is used for the fuzzy values clustering in the prepare recipe on DSS?

According to: https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/How-to-standardize-text-fields-usin...

You can choose a clustering strategy of “Fuzzy” or “Highly fuzzy” to cluster and merge similar text in the dataset. What is this fuzzy matching based on? Damerau–Levenshtein? If so, what threshold? It would be great to understand the logic behind the fuzzy clustering before applying the prepare recipe step.

Thanks in advance.


Operating system used: windows

0 Kudos
2 Replies
CoreyS
Dataiker Alumni

Hi @JCB while you wait for a more complete response, here are some resources you may find helpful:

I hope this helps!

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
0 Kudos
rakesh2
Level 1

Hi, Is there any further update on this question? I am also interested to know the logic behind the fuzzy values clustering .

 

0 Kudos