What formula is used for the fuzzy values clustering in the prepare recipe on DSS?
JCB
Registered Posts: 7 ✭✭✭
You can choose a clustering strategy of “Fuzzy” or “Highly fuzzy” to cluster and merge similar text in the dataset. What is this fuzzy matching based on? Damerau–Levenshtein? If so, what threshold? It would be great to understand the logic behind the fuzzy clustering before applying the prepare recipe step.
Thanks in advance.
Operating system used: windows
Answers
-
CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
Hi @JCB
while you wait for a more complete response, here are some resources you may find helpful:- Fuzzy join: joining two datasets (Documentation)
- Hands-On Tutorial: Fuzzy Join Recipe (Academy/Knowledge Base)
I hope this helps!
-
Hi, Is there any further update on this question? I am also interested to know the logic behind the fuzzy values clustering .