Submit your use case or success story to the 2023 edition of the Dataiku Frontrunner Awards ENTER YOUR SUBMISSION

What formula is used for the fuzzy values clustering in the prepare recipe on DSS?

JCB
Level 2
What formula is used for the fuzzy values clustering in the prepare recipe on DSS?

According to: https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/How-to-standardize-text-fields-usin...

You can choose a clustering strategy of “Fuzzy” or “Highly fuzzy” to cluster and merge similar text in the dataset. What is this fuzzy matching based on? Damerau–Levenshtein? If so, what threshold? It would be great to understand the logic behind the fuzzy clustering before applying the prepare recipe step.

Thanks in advance.


Operating system used: windows

0 Kudos
1 Reply
CoreyS
Dataiker Alumni

Hi @JCB while you wait for a more complete response, here are some resources you may find helpful:

I hope this helps!

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
0 Kudos