Handling categorical variable | How to apply LabelEncoding and OneHotEncoding in DSS
fk
Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 4 Partner
Hi Team,
Can someone tell me that how can I apply LabelEncoding for categorical variable?
Answers
-
Hi @fk
I'll borrow the answer from one our devs.
- Create a file in your project library, under python folder. call it my_label_encoder.py
- Populate with this modification of the scikit-learn LabelEncoder
from sklearn import preprocessing class MyLabelEncoder(preprocessing.LabelEncoder): def transform(self, X): transformed_X = super(MyLabelEncoder, self).transform(X) return transformed_X.reshape(transformed_X.shape[0], 1)
- Then in the Design tab of the ML task of your analysis, under feature handling, select the relevant feature, check that it's categorical.
- Under Category Handling select "custom preprocessing" and put this
from my_label_encoder import MyLabelEncoder processor = MyLabelEncoder()
The reason to do it this way, is that the vanilla encoder returns a one-dimensional array, but DSS expects a 2-D array. As you can see the modification of the encoder just reshapes the output.
(thanks Nico!)