Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
According to: https://knowledge.dataiku.com/latest/kb/data-prep/prepare-recipe/How-to-standardize-text-fields-usin...
You can choose a clustering strategy of “Fuzzy” or “Highly fuzzy” to cluster and merge similar text in the dataset. What is this fuzzy matching based on? Damerau–Levenshtein? If so, what threshold? It would be great to understand the logic behind the fuzzy clustering before applying the prepare recipe step.
Thanks in advance.
Operating system used: windows
Hi @JCB while you wait for a more complete response, here are some resources you may find helpful:
I hope this helps!