Can I reset the "Unfold processor " to create more than 100 columns or any alternative method please

Samtschan · ‎12-03-2020

Hi,

I would like to do a dummification on the data field "Insured City".

However, it will create more than 100 columns, which is the maximum number allowed for the Unfold processor in the prepare recipe. It is a way I can increase the number of columns created or is there is an alternative method to the Unfold processor, please?

Thanks,

Sam

tgb417 · ‎12-03-2020

@Samtschan

You don't say anything about how you plan to use this dummified data.

If you plan to use it for model building inside DSS's visual model building. Then built into the Lab Visual ML recipes there are a series of feature handling for categorical features

and text variables.

These are how I usually handle dummification.

Here is a video showing the use of these various encoding options for Catagorical and Text features. This will give you a sense of the ideas. However, the video is a bit old and the UI has changed some.

https://academy.dataiku.com/machine-learning-basics-open/522100

Here is the current documentation.

https://doc.dataiku.com/dss/latest/machine-learning/features-handling/index.html

Follow up here and let us know how you are getting on with this, or share a bit more detail about your use case.

Welcome to the Dataiku Community.

--Tom

Samtschan · ‎12-04-2020

Hi Tom,

Thank you for your help. I was trying to perform the one hot encoding technique, hoping to transform a categorical variable data field into numerical so that the accuracy of ML algorithms can be improved. I should be able to do so with the Category handling by selecting the Dummy-encoding.

Many Thanks,

Sam

Sign up to take part

Can I reset the "Unfold processor " to create more than 100 columns or any alternative method please

Can I reset the "Unfold processor " to create more than 100 columns or any alternative method please