Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

How to encode categorical variables for modeling?

Dataiker
Dataiker
How to encode categorical variables for modeling?
Is it okay to use categorical variable directly or is it better to use one-hot encoding?
0 Kudos
1 Reply
Dataiker Alumni
It's ok to use categorical variable directly. The model will automatically do the one-hot encoding. This is also called dummification.

You can chose in project "Settings" between one-hot encoding and "impact encoding". For text variable, there other options available: tf-idf, hashing, etc.
0 Kudos
Labels (2)