Feature engineering on Test/Holdout dataset

Options
obireddy
obireddy Registered Posts: 13 ✭✭

Hi Team,

I have dataset consists of 15 features and 1M records. split the dataset as train and test with 80-20%.

While train the model(XGBoost) on train data using Auto ML lab in Dataiku, for train data passed only 10 best features and applied minmax rescaling and imputation for those features.

Now I am passing the test data to score the probabilities using same model, should I apply minmax rescaling and imputation explicitly before predicting or trained model will take care as feature engineering methods on train data?

Tagged:

Best Answer

Answers

  • AdrienL
    AdrienL Dataiker, Alpha Tester Posts: 196 Dataiker
    Options

    Any feature preprocessing applied in the settings of the automl model will be automatically applied when scoring using the trained model

  • obireddy
    obireddy Registered Posts: 13 ✭✭
    Options

    Thank you, so you mean to say we do not need to do any feature engineering on test/future/upcoming data. trained model it self will take care all those things as we design the model in training. right ?

Setup Info
    Tags
      Help me…