How the variable importance is computed?

Ju · November 2021

Hello,

I am looking for some information about the variable importance for the model Random Forest and XGboost. I have very different output.

I would like to know what kind of method you use to compute them. Not the same one for all models?

pmasiphelps · January 2022

Hi,

For Random Forest, visual ML uses the standard attribute from sklearn: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier.feature_importances_

Same thing with XGBoost, the standard attribute: https://xgboost.readthedocs.io/en/stable/python/python_api.html

Note that the importances shown are for the preprocessed features according to your Design screen settings (e.g. if you do standard rescaling in the Features Handling tab - importances are shown for the rescaled features.).

Best,

Pat

Tao_Z999 · October 2022

Hi Thanks, just a follow up question, what "importance type" was used in dataiku xgboost? because in the official document, there are several types, the default is "weight", so I guess it's "weight"?

importance_type (Optional[str]) –
The feature importance type for the feature_importances_ property:
- For tree model, it’s either “gain”, “weight”, “cover”, “total_gain” or “total_cover”.
- For linear model, only “weight” is defined and it’s the normalized coefficients without bias.

How the variable importance is computed?

Answers

Categories

Setup Info

Tags