Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

What formula is used for the fuzzy values clustering in the prepare recipe on DSS?

JCB
Level 1
What formula is used for the fuzzy values clustering in the prepare recipe on DSS?

According to: https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/How-to-standardize-text-fields-usin...

You can choose a clustering strategy of “Fuzzy” or “Highly fuzzy” to cluster and merge similar text in the dataset. What is this fuzzy matching based on? Damerau–Levenshtein? If so, what threshold? It would be great to understand the logic behind the fuzzy clustering before applying the prepare recipe step.

Thanks in advance.


Operating system used: windows

0 Kudos
1 Reply
CoreyS
Community Manager
Community Manager

Hi @JCB while you wait for a more complete response, here are some resources you may find helpful:

I hope this helps!

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
0 Kudos