Discover all of the brand-new features and improvements to existing capabilities in the Dataiku 11.3 updateLET'S GO

Ordinal Feature Handling in Visual ML

As an Analyst wanting to use visual ML, I'd like to be able to treat my categorical variables like Ordinal variables, Categories that have an Order.  So that I can more accurately represent my features and hopefully build better models.

I believe that Sci-Kit Learn has this idea in the underlying libraries.

I fact I'd like to see this idea carried through the entire system and into charts as well. 

An example might be the categories

s, m, l, xl, xxl, xxxl

There is a meaning to this order.  


Maybe this is something that I can already do with the Custom Processor.  However, it is not immediately apparent, to an analyst persona.

Status changed to: Acknowledged

Thanks @tgb417 , that's a good suggestion

Thanks @Krishna.

While you are considering adding this to Visual ML to DSS.  Is there any way today to add this through custom preprocessor on the exiting catageorical variable?


Great question, it is possible to use sklearn.preprocessing.OrdinalEncoder - scikit-learn 0.20.4 documentation as a custom preprocessor, but it's not fully supported -- some features may have issues (e.g. partial dependence plots/sub population analysis) due to the lack of unknown value handling in the sklearn 0.20 version (i.e. categories unseen at fit time, but called upon during transformation). It's improved in the latest version of sklearn, however visual ML does not yet support 0.24 due to some significant changes in their API (which we are working on).


Hi @tgb417 ,

The best way to do ordinal encoding would be to use a Prepare recipe before the ML part of your flow. In the Prepare recipe, you can use a "Find and replace" processor and map your string values with numbers. As of your exemple, you would map S to 0, M to 1, L to 2, etc..