Discrepancies between feature importance through UI and Python API
Hi all,
I've been looking at feature importance in the Explainability tab of a Saved Model. I noticed that, when I extract these feature importance values through the Python API, I get a different result compared to the absolute feature importance graph shown in the UI. I attached the picture of the graph.
The python code i've been using:
{'bmTrede': 0.025203506892614976, 'looptijdJaar': 0.1638402888326436, 'leeftijdAuto': 0.25731763242892847, 'leeftijdBestuurder': 0.18891522802689867, 'leeftijd': 0.05215805599269218,
...}
The result of this code is shown below for a few features:
# Import modules import dataiku # Get project handle client = dataiku.api_client() project = client.get_default_project() # Get saved model savedModel = project.get_saved_model(sm_id="guAFiNqy") # Get correct version activeVersion = savedModel.get_active_version() versionDetails = savedModel.get_version_details(activeVersion["id"]) # Get absolute feature importance values absoluteImportance = versionDetails.get_raw()["globalExplanationsAbsoluteImportance"] # Alteratively: recompute shapely featute importance absoluteFeatureImportance = versionDetails.compute_shapley_feature_importance() rawResult = absoluteFeatureImportance.wait_for_result() absoluteImportance = rawResult["absoluteImportance"]
Clearly there is a difference. Could anyone help me explain where this difference is coming from? The model is a supervised binary classification model with the XGboost algorithm and k-fold cross testing.
Sidenote1: if I look at the RawImportances in the VersionDetails object, I see that the values correspond to the Gini feature importance.
Sidenote2: the feature drift importance chart in our MES does not correspond to the shapely feature importance. We discovered that in the MES, it is actually the column importance that is shown for the original model. Quite confusing.