Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I would like to do a dummification on the data field "Insured City".
However, it will create more than 100 columns, which is the maximum number allowed for the Unfold processor in the prepare recipe. It is a way I can increase the number of columns created or is there is an alternative method to the Unfold processor, please?
You don't say anything about how you plan to use this dummified data.
If you plan to use it for model building inside DSS's visual model building. Then built into the Lab Visual ML recipes there are a series of feature handling for categorical features
and text variables.
These are how I usually handle dummification.
Here is a video showing the use of these various encoding options for Catagorical and Text features. This will give you a sense of the ideas. However, the video is a bit old and the UI has changed some.
Here is the current documentation.
Follow up here and let us know how you are getting on with this, or share a bit more detail about your use case.
Welcome to the Dataiku Community.
Thank you for your help. I was trying to perform the one hot encoding technique, hoping to transform a categorical variable data field into numerical so that the accuracy of ML algorithms can be improved. I should be able to do so with the Category handling by selecting the Dummy-encoding.