Feature engineering on Test/Holdout dataset
Hi Team,
I have dataset consists of 15 features and 1M records. split the dataset as train and test with 80-20%.
While train the model(XGBoost) on train data using Auto ML lab in Dataiku, for train data passed only 10 best features and applied minmax rescaling and imputation for those features.
Now I am passing the test data to score the probabilities using same model, should I apply minmax rescaling and imputation explicitly before predicting or trained model will take care as feature engineering methods on train data?
Best Answer
-
Yes, the same preprocessing that was configured in the Design settings of the model will be applied at scoring time.
Answers
-
Any feature preprocessing applied in the settings of the automl model will be automatically applied when scoring using the trained model
-
Thank you, so you mean to say we do not need to do any feature engineering on test/future/upcoming data. trained model it self will take care all those things as we design the model in training. right ?