Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

Num to cat conversion in clustering. Is it possible to have only 1 (like impact testing) column instead of n dummy?

Marija
Level 1
Num to cat conversion in clustering. Is it possible to have only 1 (like impact testing) column instead of n dummy?
I was able to convert feature from numerical to categorical and choose impact testing from drop list in supervised modeling. Am I able to do the same in clustering? More precisely, I am using k-means from quick models in LAB from 1 data set. In the dropdown list I have 4 options, I am not sure which one to use.



Thank you so much,

Marija
0 Kudos
1 Reply
Liev
Dataiker Alumni

Hi Marija,



impact-coding (or target encoding) refers to a preprocessing method where the categorical feature in question is encoded by using the target variable. A little more info here: http://contrib.scikit-learn.org/categorical-encoding/targetencoder.html



As you can see, in the context of clustering (unsupervised learning) where no target variable is known, this preprocessing option is not available in DSS. 



I hope this helps!

0 Kudos

Labels

?
Labels (2)
A banner prompting to get Dataiku