Feature engineering on Test/Holdout dataset

obireddy Registered Posts: 13 ✭✭

Hi Team,

I have dataset consists of 15 features and 1M records. split the dataset as train and test with 80-20%.

While train the model(XGBoost) on train data using Auto ML lab in Dataiku, for train data passed only 10 best features and applied minmax rescaling and imputation for those features.

Now I am passing the test data to score the probabilities using same model, should I apply minmax rescaling and imputation explicitly before predicting or trained model will take care as feature engineering methods on train data?


Best Answer


  • AdrienL
    AdrienL Dataiker, Alpha Tester Posts: 196 Dataiker

    Any feature preprocessing applied in the settings of the automl model will be automatically applied when scoring using the trained model

  • obireddy
    obireddy Registered Posts: 13 ✭✭

    Thank you, so you mean to say we do not need to do any feature engineering on test/future/upcoming data. trained model it self will take care all those things as we design the model in training. right ?

Setup Info
      Help me…