Does Dataiku platform creates new classes on its own while running classification Algorithm

ShrishtiNeogi
Level 1
Does Dataiku platform creates new classes on its own while running classification Algorithm

Does Dataiku platform creates new classes like "Others" on its own while running any classification algorithm like random forest. I am able to see top important variables displaying others as important class though I observe that i havent created any such new class. Could anyone please explain this ?

0 Kudos
1 Reply
AlexT
Dataiker

Hi @ShrishtiNeogi ,

"others" is always present in dummification (one-hot encoding) because it's needed to plan for the possibility of a value for this column that was not seen when training. It is created even when all other values have dummies.

It can be present in feature importance also because some dummies can be dropped depending on the dummification settings for example only keeping the 10 most frequent features. 

It so happens that the 12th one is really impactful. If so, the only way to actually see that dummy in the feature importance is by increasing the number of dummies. 

Hope that helps! 

0 Kudos