Dataiku Named a Gartner Magic Quadrant Leader 2 Years Running! Read More

How to set (or remove) default maximum number of categories in modeling?

Solved!
okiriza
Level 2
How to set (or remove) default maximum number of categories in modeling?
For categorical variables, there is "Max. Nb. Categories" field which I think is set to 100 by default.

Is there any way to remove this default maximum number of categories or set the default to a different value?

Thanks.
0 Kudos
1 Solution
Mattsco
Dataiker
Dataiker
Hi,

No you can't. This limit is useful to avoid Ram memory error when you train models.
I suggest you instead to use "hashing" encoding.

A sparse matrix will be build. Notice that in scikit-learn only some algorithms allow sparse matrix.

To do it easily, you can sort your features by types, select all your categorical features and click Hashing instead of Dummy-encode.

Matt
Mattsco

View solution in original post

1 Reply
Mattsco
Dataiker
Dataiker
Hi,

No you can't. This limit is useful to avoid Ram memory error when you train models.
I suggest you instead to use "hashing" encoding.

A sparse matrix will be build. Notice that in scikit-learn only some algorithms allow sparse matrix.

To do it easily, you can sort your features by types, select all your categorical features and click Hashing instead of Dummy-encode.

Matt
Mattsco

View solution in original post

Labels (2)
A banner prompting to get Dataiku DSS