Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello,
I am looking for some information about the variable importance for the model Random Forest and XGboost. I have very different output.
I would like to know what kind of method you use to compute them. Not the same one for all models?
Hi,
For Random Forest, visual ML uses the standard attribute from sklearn: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklea...
Same thing with XGBoost, the standard attribute: https://xgboost.readthedocs.io/en/stable/python/python_api.html
Note that the importances shown are for the preprocessed features according to your Design screen settings (e.g. if you do standard rescaling in the Features Handling tab - importances are shown for the rescaled features.).
Best,
Pat
Hi Thanks, just a follow up question, what "importance type" was used in dataiku xgboost? because in the official document, there are several types, the default is "weight", so I guess it's "weight"?
importance_type (Optional[str]) โ
The feature importance type for the feature_importances_ property:
For tree model, itโs either โgainโ, โweightโ, โcoverโ, โtotal_gainโ or โtotal_coverโ.
For linear model, only โweightโ is defined and itโs the normalized coefficients without bias.