ML Model Scoring - How are new values for categorical columns handled?
Marlan
Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 321 Neuron
Hi all,
I am trying to understand how new values for categorical columns are handled by ML models. By new values, I mean values that weren't included in the training data.
I am using one hot encoding with the Visual ML settings with minimum samples and max value settings so presumably there will be an Other category for all the infrequently occurring values in the training data.
Here's my specific question: are the new values put into the Other category when scoring new records?
Anyone have any insights on this?
Thanks,
Marlan
Operating system used: Red Hat
Operating system used: Red Hat
Tagged:
Best Answer
-
Hi,
Yes, categorical values not seen during training are put in the Other category.