Submit your innovative use case or inspiring success story to the 2023 Dataiku Frontrunner Awards! LET'S GO

Impact Encoding, method and cross validation.

Level 1
Impact Encoding, method and cross validation.

Impact Encoding is offered as an alternative to one-hot-encoding. Nevertheless, i was not able to find clear documentation on two things:

1) What method is used. Simple averages/probabilities or is some kind of shrinkage used to prevent overfitting.

2) Does this happen within the cross validation loop (i.e., new encoding for each fold) or on the whole train data set before the cross-validation iteration starts? This question is also valid for the imputation methods.



Operating system used: Linux


0 Replies


Labels (1)
A banner prompting to get Dataiku