Survey banner
Switching to Dataiku - a new area to help users who are transitioning from other tools and diving into Dataiku! CHECK IT OUT

How to set (or remove) default maximum number of categories in modeling?

Solved!
okiriza
Level 2
How to set (or remove) default maximum number of categories in modeling?
For categorical variables, there is "Max. Nb. Categories" field which I think is set to 100 by default.

Is there any way to remove this default maximum number of categories or set the default to a different value?

Thanks.
0 Kudos
1 Solution
Mattsco
Dataiker
Hi,

No you can't. This limit is useful to avoid Ram memory error when you train models.
I suggest you instead to use "hashing" encoding.

A sparse matrix will be build. Notice that in scikit-learn only some algorithms allow sparse matrix.

To do it easily, you can sort your features by types, select all your categorical features and click Hashing instead of Dummy-encode.

Matt
Mattsco

View solution in original post

1 Reply
Mattsco
Dataiker
Hi,

No you can't. This limit is useful to avoid Ram memory error when you train models.
I suggest you instead to use "hashing" encoding.

A sparse matrix will be build. Notice that in scikit-learn only some algorithms allow sparse matrix.

To do it easily, you can sort your features by types, select all your categorical features and click Hashing instead of Dummy-encode.

Matt
Mattsco

Labels

?
Labels (2)
A banner prompting to get Dataiku