You now have until September 15th to submit your use case or success story to the 2022 Dataiku Frontrunner Awards!ENTER YOUR SUBMISSION

How to encode categorical variables for modeling?

UserBird
Dataiker
Dataiker
How to encode categorical variables for modeling?
Is it okay to use categorical variable directly or is it better to use one-hot encoding?
0 Kudos
1 Reply
jrouquie
Dataiker Alumni
It's ok to use categorical variable directly. The model will automatically do the one-hot encoding. This is also called dummification.

You can chose in project "Settings" between one-hot encoding and "impact encoding". For text variable, there other options available: tf-idf, hashing, etc.
0 Kudos