Ordinal Feature Handling in Visual ML
As an Analyst wanting to use visual ML, I'd like to be able to treat my categorical variables like Ordinal variables, Categories that have an Order. So that I can more accurately represent my features and hopefully build better models.
I believe that Sci-Kit Learn has this idea in the underlying libraries.
I fact I'd like to see this idea carried through the entire system and into charts as well.
An example might be the categories
s, m, l, xl, xxl, xxxl
There is a meaning to this order.
Comments
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
Maybe this is something that I can already do with the Custom Processor. However, it is not immediately apparent, to an analyst persona.
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
Thanks @Krishna
.While you are considering adding this to Visual ML to DSS. Is there any way today to add this through custom preprocessor on the exiting catageorical variable?
-
Krishna Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Product Ideas Manager Posts: 18 Dataiker
Great question, it is possible to use sklearn.preprocessing.OrdinalEncoder - scikit-learn 0.20.4 documentation as a custom preprocessor, but it's not fully supported -- some features may have issues (e.g. partial dependence plots/sub population analysis) due to the lack of unknown value handling in the sklearn 0.20 version (i.e. categories unseen at fit time, but called upon during transformation). It's improved in the latest version of sklearn, however visual ML does not yet support 0.24 due to some significant changes in their API (which we are working on).
-
Hi @tgb417
,The best way to do ordinal encoding would be to use a Prepare recipe before the ML part of your flow. In the Prepare recipe, you can use a "Find and replace" processor and map your string values with numbers. As of your exemple, you would map S to 0, M to 1, L to 2, etc..