Discrepancies between feature importance through UI and Python API

Jus Registered Posts: 7
edited July 16 in Using Dataiku

Hi all,

I've been looking at feature importance in the Explainability tab of a Saved Model. I noticed that, when I extract these feature importance values through the Python API, I get a different result compared to the absolute feature importance graph shown in the UI. I attached the picture of the graph.

The python code i've been using:

{'bmTrede': 0.025203506892614976,
  'looptijdJaar': 0.1638402888326436,
  'leeftijdAuto': 0.25731763242892847,
  'leeftijdBestuurder': 0.18891522802689867,
  'leeftijd': 0.05215805599269218,

The result of this code is shown below for a few features:

# Import modules
import dataiku

# Get project handle
client = dataiku.api_client()
project = client.get_default_project()

# Get saved model
savedModel   = project.get_saved_model(sm_id="guAFiNqy")

# Get correct version
activeVersion   = savedModel.get_active_version()
versionDetails  = savedModel.get_version_details(activeVersion["id"])

# Get absolute feature importance values
absoluteImportance = versionDetails.get_raw()["globalExplanationsAbsoluteImportance"]

# Alteratively: recompute shapely featute importance
absoluteFeatureImportance = versionDetails.compute_shapley_feature_importance()
rawResult = absoluteFeatureImportance.wait_for_result()
absoluteImportance = rawResult["absoluteImportance"]

Clearly there is a difference. Could anyone help me explain where this difference is coming from? The model is a supervised binary classification model with the XGboost algorithm and k-fold cross testing.

Sidenote1: if I look at the RawImportances in the VersionDetails object, I see that the values correspond to the Gini feature importance.

Sidenote2: the feature drift importance chart in our MES does not correspond to the shapely feature importance. We discovered that in the MES, it is actually the column importance that is shown for the original model. Quite confusing.

Setup Info
      Help me…