Announcing the winners & finalists of the Dataiku Frontrunner Awards 2021! Read their inspiring stories

Can I reset the "Unfold processor " to create more than 100 columns or any alternative method please

Samtschan
Level 1
Can I reset the "Unfold processor " to create more than 100 columns or any alternative method please

Hi, 

I would like to do a dummification on the data field "Insured City".

However, it will create more than 100 columns, which is the maximum number allowed for the Unfold processor in the prepare recipe. It is a way I can increase the number of columns created or is there is an alternative method to the Unfold processor, please?

Thanks, 

Sam 

0 Kudos
2 Replies
tgb417
Neuron
Neuron

@Samtschan 

You don't say anything about how you plan to use this dummified data. 

If you plan to use it for model building inside DSS's visual model building.  Then built into the Lab Visual ML recipes there are a series of feature handling for categorical featuresCatagorical Feature Handeling.jpg

and text variables.

Text Feature hadeling.jpg

These are how I usually handle dummification.  

Here is a video showing the use of these various encoding options for Catagorical and Text features.  This will give you a sense of the ideas.  However, the video is a bit old and the UI has changed some.

https://academy.dataiku.com/machine-learning-basics-open/522100

Here is the current documentation.

https://doc.dataiku.com/dss/latest/machine-learning/features-handling/index.html

Follow up here and let us know how you are getting on with this, or share a bit more detail about your use case.

Welcome to the Dataiku Community.

--Tom
0 Kudos
Samtschan
Level 1
Author

Hi Tom, 

Thank you for your help. I was trying to perform the one hot encoding technique, hoping to transform a categorical variable data field into numerical so that the accuracy of ML algorithms can be improved. I should be able to do so with the Category handling by selecting the Dummy-encoding.

 

Many Thanks, 

Sam 

A banner prompting to get Dataiku DSS