Does Dataiku platform creates new classes on its own while running classification Algorithm

Options
ShrishtiNeogi
ShrishtiNeogi Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 2 Partner

Does Dataiku platform creates new classes like "Others" on its own while running any classification algorithm like random forest. I am able to see top important variables displaying others as important class though I observe that i havent created any such new class. Could anyone please explain this ?

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    Options

    Hi @ShrishtiNeogi
    ,

    "others" is always present in dummification (one-hot encoding) because it's needed to plan for the possibility of a value for this column that was not seen when training. It is created even when all other values have dummies.

    It can be present in feature importance also because some dummies can be dropped depending on the dummification settings for example only keeping the 10 most frequent features.

    It so happens that the 12th one is really impactful. If so, the only way to actually see that dummy in the feature importance is by increasing the number of dummies.

    Hope that helps!

Setup Info
    Tags
      Help me…