Can I reset the "Unfold processor " to create more than 100 columns or any alternative method please
Hi,
I would like to do a dummification on the data field "Insured City".
However, it will create more than 100 columns, which is the maximum number allowed for the Unfold processor in the prepare recipe. It is a way I can increase the number of columns created or is there is an alternative method to the Unfold processor, please?
Thanks,
Sam
Answers
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
You don't say anything about how you plan to use this dummified data.
If you plan to use it for model building inside DSS's visual model building. Then built into the Lab Visual ML recipes there are a series of feature handling for categorical features
and text variables.
These are how I usually handle dummification.
Here is a video showing the use of these various encoding options for Catagorical and Text features. This will give you a sense of the ideas. However, the video is a bit old and the UI has changed some.
https://academy.dataiku.com/machine-learning-basics-open/522100
Here is the current documentation.
https://doc.dataiku.com/dss/latest/machine-learning/features-handling/index.html
Follow up here and let us know how you are getting on with this, or share a bit more detail about your use case.
Welcome to the Dataiku Community.
-
Hi Tom,
Thank you for your help. I was trying to perform the one hot encoding technique, hoping to transform a categorical variable data field into numerical so that the accuracy of ML algorithms can be improved. I should be able to do so with the Category handling by selecting the Dummy-encoding.
Many Thanks,
Sam