Ordinal Feature Handling in Visual ML

Options
tgb417
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron

As an Analyst wanting to use visual ML, I'd like to be able to treat my categorical variables like Ordinal variables, Categories that have an Order. So that I can more accurately represent my features and hopefully build better models.

I believe that Sci-Kit Learn has this idea in the underlying libraries.

I fact I'd like to see this idea carried through the entire system and into charts as well.

An example might be the categories

s, m, l, xl, xxl, xxxl

There is a meaning to this order.

1
1 votes

In the Backlog · Last Updated

Comments

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    Maybe this is something that I can already do with the Custom Processor. However, it is not immediately apparent, to an analyst persona.

  • Krishna
    Krishna Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Product Ideas Manager Posts: 18 Dataiker
    Options

    Thanks @tgb417
    , that's a good suggestion

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    Thanks @Krishna
    .

    While you are considering adding this to Visual ML to DSS. Is there any way today to add this through custom preprocessor on the exiting catageorical variable?

  • Krishna
    Krishna Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Product Ideas Manager Posts: 18 Dataiker
    Options

    Great question, it is possible to use sklearn.preprocessing.OrdinalEncoder - scikit-learn 0.20.4 documentation as a custom preprocessor, but it's not fully supported -- some features may have issues (e.g. partial dependence plots/sub population analysis) due to the lack of unknown value handling in the sklearn 0.20 version (i.e. categories unseen at fit time, but called upon during transformation). It's improved in the latest version of sklearn, however visual ML does not yet support 0.24 due to some significant changes in their API (which we are working on).

  • JeremieP
    JeremieP Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner Posts: 7 Dataiker
    Options

    Hi @tgb417
    ,

    The best way to do ordinal encoding would be to use a Prepare recipe before the ML part of your flow. In the Prepare recipe, you can use a "Find and replace" processor and map your string values with numbers. As of your exemple, you would map S to 0, M to 1, L to 2, etc..

Setup Info
    Tags
      Help me…