ML Model Scoring - How are new values for categorical columns handled?

Marlan
Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 319 Neuron

Hi all,

I am trying to understand how new values for categorical columns are handled by ML models. By new values, I mean values that weren't included in the training data.

I am using one hot encoding with the Visual ML settings with minimum samples and max value settings so presumably there will be an Other category for all the infrequently occurring values in the training data.

Here's my specific question: are the new values put into the Other category when scoring new records?

Anyone have any insights on this?

Thanks,

Marlan


Operating system used: Red Hat


Operating system used: Red Hat

Best Answer

Setup Info
    Tags
      Help me…