Feature engineering on Test/Holdout dataset

Solved!
Obireddy
Level 2
Feature engineering on Test/Holdout dataset

Hi Team,

I have dataset consists of 15 features and 1M records. split the dataset as train and test with 80-20%.

While train the model(XGBoost) on train data using Auto ML lab in Dataiku, for train data passed only 10 best features and applied minmax rescaling and imputation for those features.

Now I am passing the test data to score the probabilities using same model, should I apply minmax rescaling and imputation explicitly before predicting or trained model will take care as feature engineering methods on train data? 

0 Kudos
1 Solution
AdrienL
Dataiker

Yes, the same preprocessing that was configured in the Design settings of the model will be applied at scoring time.

View solution in original post

0 Kudos
3 Replies
AdrienL
Dataiker

Any feature preprocessing applied in the settings of the automl model will be automatically applied when scoring using the trained model

Obireddy
Level 2
Author

Thank you, so you mean to say we do not need to do any feature engineering on test/future/upcoming data. trained model it self will take care all those things as we design the model in training. right ?

0 Kudos
AdrienL
Dataiker

Yes, the same preprocessing that was configured in the Design settings of the model will be applied at scoring time.

0 Kudos