Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi, I found the Value Clustering under Field -> Analyze is very helpful to normalize data when, say, there are 100 different ways for "United States". Dataiku clustering function makes it easier to map different variations into the same value. However, it won't map USA into United States. I am wondering if there is a way to train the system to make the mapping. This is just a simple use case. We have more complex clustering mapping use cases that could benefit from further training capability to train the system to optimize the clustering mapping.
Cheers, Hong
You could build a table/dataset of what should be mapped to what, and use a join recipe. You could look at a fuzzy join recipe to handle typos.