Can I reset the "Unfold processor " to create more than 100 columns or any alternative method please

Samtschan
Samtschan Registered Posts: 3 ✭✭✭

Hi,

I would like to do a dummification on the data field "Insured City".

However, it will create more than 100 columns, which is the maximum number allowed for the Unfold processor in the prepare recipe. It is a way I can increase the number of columns created or is there is an alternative method to the Unfold processor, please?

Thanks,

Sam

Answers

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    @Samtschan

    You don't say anything about how you plan to use this dummified data.

    If you plan to use it for model building inside DSS's visual model building. Then built into the Lab Visual ML recipes there are a series of feature handling for categorical featuresCatagorical Feature Handeling.jpg

    and text variables.

    Text Feature hadeling.jpg

    These are how I usually handle dummification.

    Here is a video showing the use of these various encoding options for Catagorical and Text features. This will give you a sense of the ideas. However, the video is a bit old and the UI has changed some.

    https://academy.dataiku.com/machine-learning-basics-open/522100

    Here is the current documentation.

    https://doc.dataiku.com/dss/latest/machine-learning/features-handling/index.html

    Follow up here and let us know how you are getting on with this, or share a bit more detail about your use case.

    Welcome to the Dataiku Community.

  • Samtschan
    Samtschan Registered Posts: 3 ✭✭✭

    Hi Tom,

    Thank you for your help. I was trying to perform the one hot encoding technique, hoping to transform a categorical variable data field into numerical so that the accuracy of ML algorithms can be improved. I should be able to do so with the Category handling by selecting the Dummy-encoding.

    Many Thanks,

    Sam

Setup Info
    Tags
      Help me…