Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

Categorical feature encoding | High cardinality

adf057
Level 2
Categorical feature encoding | High cardinality

Hi Dataiku folks,

I have a query, my feature predictor variable ‘product id’ has 300++ distinct values, it’s a multiple time series data by product.

what kind of encoding should I apply?

one hot will increase the feature space, label encoding the model is performing not as good.


My goal is to identify any anomalies (using isolation forest) in the product pricing with additional price related features making it multivariate problem.

 

Appreciate your thoughts.

0 Kudos
0 Replies