Submit your use case or success story to the 2023 edition of the Dataiku Frontrunner Awards ENTER YOUR SUBMISSION

How to set (or remove) default maximum number of categories in modeling?

Solved!
okiriza
Level 2
How to set (or remove) default maximum number of categories in modeling?
For categorical variables, there is "Max. Nb. Categories" field which I think is set to 100 by default.

Is there any way to remove this default maximum number of categories or set the default to a different value?

Thanks.
0 Kudos
1 Solution
Mattsco
Dataiker
Hi,

No you can't. This limit is useful to avoid Ram memory error when you train models.
I suggest you instead to use "hashing" encoding.

A sparse matrix will be build. Notice that in scikit-learn only some algorithms allow sparse matrix.

To do it easily, you can sort your features by types, select all your categorical features and click Hashing instead of Dummy-encode.

Matt
Mattsco

View solution in original post

1 Reply
Mattsco
Dataiker
Hi,

No you can't. This limit is useful to avoid Ram memory error when you train models.
I suggest you instead to use "hashing" encoding.

A sparse matrix will be build. Notice that in scikit-learn only some algorithms allow sparse matrix.

To do it easily, you can sort your features by types, select all your categorical features and click Hashing instead of Dummy-encode.

Matt
Mattsco