Ready for Dataiku 10? Try out the Crash Course on new features!GET STARTED

Ordinal Feature Handling in Visual ML

As an Analyst wanting to use visual ML, I'd like to be able to treat my categorical variables like Ordinal variables, Categories that have an Order.  So that I can more accurately represent my features and hopefully build better models.

I believe that Sci-Kit Learn has this idea in the underlying libraries.

I fact I'd like to see this idea carried through the entire system and into charts as well. 

An example might be the categories

s, m, l, xl, xxl, xxxl

There is a meaning to this order.  

5 Comments
tgb417
Neuron
Neuron

Maybe this is something that I can already do with the Custom Processor.  However, it is not immediately apparent, to an analyst persona.

Krishna
Dataiker
Dataiker
Status changed to: Acknowledged

Thanks @tgb417 , that's a good suggestion

tgb417
Neuron
Neuron

Thanks @Krishna.

While you are considering adding this to Visual ML to DSS.  Is there any way today to add this through custom preprocessor on the exiting catageorical variable?

Krishna
Dataiker
Dataiker

Great question, it is possible to use sklearn.preprocessing.OrdinalEncoder - scikit-learn 0.20.4 documentation as a custom preprocessor, but it's not fully supported -- some features may have issues (e.g. partial dependence plots/sub population analysis) due to the lack of unknown value handling in the sklearn 0.20 version (i.e. categories unseen at fit time, but called upon during transformation). It's improved in the latest version of sklearn, however visual ML does not yet support 0.24 due to some significant changes in their API (which we are working on).

JeremieP
Dataiker
Dataiker

Hi @tgb417 ,

The best way to do ordinal encoding would be to use a Prepare recipe before the ML part of your flow. In the Prepare recipe, you can use a "Find and replace" processor and map your string values with numbers. As of your exemple, you would map S to 0, M to 1, L to 2, etc..