The Dataiku Frontrunner Awards are now accepting submissions until July 15 to recognize your achievements! ENTER YOUR SUBMISSION

How to set (or remove) default maximum number of categories in modeling?

Solved!
okiriza
Level 2
How to set (or remove) default maximum number of categories in modeling?
For categorical variables, there is "Max. Nb. Categories" field which I think is set to 100 by default.

Is there any way to remove this default maximum number of categories or set the default to a different value?

Thanks.
0 Kudos
1 Solution
Mattsco
Dataiker
Dataiker
Hi,

No you can't. This limit is useful to avoid Ram memory error when you train models.
I suggest you instead to use "hashing" encoding.

A sparse matrix will be build. Notice that in scikit-learn only some algorithms allow sparse matrix.

To do it easily, you can sort your features by types, select all your categorical features and click Hashing instead of Dummy-encode.

Matt
Mattsco

View solution in original post

1 Reply
Mattsco
Dataiker
Dataiker
Hi,

No you can't. This limit is useful to avoid Ram memory error when you train models.
I suggest you instead to use "hashing" encoding.

A sparse matrix will be build. Notice that in scikit-learn only some algorithms allow sparse matrix.

To do it easily, you can sort your features by types, select all your categorical features and click Hashing instead of Dummy-encode.

Matt
Mattsco

View solution in original post

Labels (2)
A banner prompting to get Dataiku DSS
Public