Discrepancies between feature importance through UI and Python API

Jus · ‎01-22-2024

Hi all,

I've been looking at feature importance in the Explainability tab of a Saved Model. I noticed that, when I extract these feature importance values through the Python API, I get a different result compared to the absolute feature importance graph shown in the UI. I attached the picture of the graph.

The python code i've been using:

# Import modules
import dataiku

# Get project handle
client = dataiku.api_client()
project = client.get_default_project()

# Get saved model
savedModel   = project.get_saved_model(sm_id="guAFiNqy")

# Get correct version
activeVersion   = savedModel.get_active_version()
versionDetails  = savedModel.get_version_details(activeVersion["id"])

# Get absolute feature importance values
absoluteImportance = versionDetails.get_raw()["globalExplanationsAbsoluteImportance"]

# Alteratively: recompute shapely featute importance
absoluteFeatureImportance = versionDetails.compute_shapley_feature_importance()
rawResult = absoluteFeatureImportance.wait_for_result()
absoluteImportance = rawResult["absoluteImportance"]

The result of this code is shown below for a few features:

{'bmTrede': 0.025203506892614976,
  'looptijdJaar': 0.1638402888326436,
  'leeftijdAuto': 0.25731763242892847,
  'leeftijdBestuurder': 0.18891522802689867,
  'leeftijd': 0.05215805599269218,
...}

Clearly there is a difference. Could anyone help me explain where this difference is coming from? The model is a supervised binary classification model with the XGboost algorithm and k-fold cross testing.

Sidenote1: if I look at the RawImportances in the VersionDetails object, I see that the values correspond to the Gini feature importance.

Sidenote2: the feature drift importance chart in our MES does not correspond to the shapely feature importance. We discovered that in the MES, it is actually the column importance that is shown for the original model. Quite confusing.

Sign up to take part

Discrepancies between feature importance through UI and Python API

Discrepancies between feature importance through UI and Python API