ML Model Scoring - How are new values for categorical columns handled?

Solved!
Marlan
ML Model Scoring - How are new values for categorical columns handled?

Hi all,

I am trying to understand how new values for categorical columns are handled by ML models. By new values, I mean values that weren't included in the training data. 

I am using one hot encoding with the Visual ML settings with minimum samples and max value settings so presumably there will be an Other category for all the infrequently occurring values in the training data.

Here's my specific question: are the new values put into the Other category when scoring new records?

Anyone have any insights on this?

Thanks,

Marlan


Operating system used: Red Hat


Operating system used: Red Hat

0 Kudos
1 Solution
AdrienL
Dataiker

Hi,

Yes, categorical values not seen during training are put in the Other category.

View solution in original post

1 Reply
AdrienL
Dataiker

Hi,

Yes, categorical values not seen during training are put in the Other category.

Labels

?
Labels (1)
A banner prompting to get Dataiku