What formula is used for the fuzzy values clustering in the prepare recipe on DSS?

JCB · May 2022

According to: https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/How-to-standardize-text-fields-using-fuzzy-values-clustering.html

You can choose a clustering strategy of “Fuzzy” or “Highly fuzzy” to cluster and merge similar text in the dataset. What is this fuzzy matching based on? Damerau–Levenshtein? If so, what threshold? It would be great to understand the logic behind the fuzzy clustering before applying the prepare recipe step.

Thanks in advance.

Operating system used: windows

CoreyS · May 2022

Hi @JCB
while you wait for a more complete response, here are some resources you may find helpful:

Fuzzy join: joining two datasets (Documentation)
Hands-On Tutorial: Fuzzy Join Recipe (Academy/Knowledge Base)

I hope this helps!

rakesh2 · August 2023

Hi, Is there any further update on this question? I am also interested to know the logic behind the fuzzy values clustering .

What formula is used for the fuzzy values clustering in the prepare recipe on DSS?

Answers

Categories

Setup Info

Tags