Check out Building a Data-Centric Culture at the ALMA Observatory on November 5th Read More

Handling categorical variable | How to apply LabelEncoding and OneHotEncoding in DSS

Level 1
Handling categorical variable | How to apply LabelEncoding and OneHotEncoding in DSS

Hi Team,

Can someone tell me that how can I apply LabelEncoding for categorical variable?

0 Kudos
1 Reply
Dataiker
Dataiker

Hi @fk 

I'll borrow the answer from one our devs.

- Create a file in your project library, under python folder. call it my_label_encoder.py

- Populate with this modification of the scikit-learn LabelEncoder

from sklearn import preprocessing
class MyLabelEncoder(preprocessing.LabelEncoder):
    def transform(self, X):
        transformed_X = super(MyLabelEncoder, self).transform(X)
        return transformed_X.reshape(transformed_X.shape[0], 1)

 

- Then in the Design tab of the ML task of your analysis, under feature handling, select the relevant feature, check that it's categorical.

- Under Category Handling select "custom preprocessing" and put this

from my_label_encoder import MyLabelEncoder
processor = MyLabelEncoder()

 

The reason to do it this way, is that the vanilla encoder returns a one-dimensional array, but DSS expects a 2-D array. As you can see the modification of the encoder just reshapes the output.

(thanks Nico!)

0 Kudos