How to set (or remove) default maximum number of categories in modeling?

okiriza
okiriza Registered Posts: 5 ✭✭✭✭
For categorical variables, there is "Max. Nb. Categories" field which I think is set to 100 by default.

Is there any way to remove this default maximum number of categories or set the default to a different value?

Thanks.

Best Answer

  • Mattsco
    Mattsco Dataiker, Registered Posts: 125 Dataiker
    Answer ✓
    Hi,

    No you can't. This limit is useful to avoid Ram memory error when you train models.
    I suggest you instead to use "hashing" encoding.

    A sparse matrix will be build. Notice that in scikit-learn only some algorithms allow sparse matrix.

    To do it easily, you can sort your features by types, select all your categorical features and click Hashing instead of Dummy-encode.

    Matt
Setup Info
    Tags
      Help me…