Handling categorical variable | How to apply LabelEncoding and OneHotEncoding in DSS

fk
fk Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 4 Partner

Hi Team,

Can someone tell me that how can I apply LabelEncoding for categorical variable?

Answers

  • Liev
    Liev Dataiker Alumni Posts: 176 ✭✭✭✭✭✭✭✭
    edited July 17

    Hi @fk

    I'll borrow the answer from one our devs.

    - Create a file in your project library, under python folder. call it my_label_encoder.py

    - Populate with this modification of the scikit-learn LabelEncoder

    from sklearn import preprocessing
    class MyLabelEncoder(preprocessing.LabelEncoder):
        def transform(self, X):
            transformed_X = super(MyLabelEncoder, self).transform(X)
            return transformed_X.reshape(transformed_X.shape[0], 1)

    - Then in the Design tab of the ML task of your analysis, under feature handling, select the relevant feature, check that it's categorical.

    - Under Category Handling select "custom preprocessing" and put this

    from my_label_encoder import MyLabelEncoder
    processor = MyLabelEncoder()

    The reason to do it this way, is that the vanilla encoder returns a one-dimensional array, but DSS expects a 2-D array. As you can see the modification of the encoder just reshapes the output.

    (thanks Nico!)

Setup Info
    Tags
      Help me…