What is globalExplanationsTopImportances ? How is it calculated?

I'm using the API to retrieve the following information for an regression model.
details.get_performance_metrics()['globalExplanationsTopImportances']
This returns a dictionary list of dictionaries with keys and "s" and "d" as follows
[{s:"Feature1",d:0.25},{s:"Feature2",d:0.15}]
What value is given as d? What are the criteria for selecting the top features here (I see a varying number of features in this list)?
Is there any way to get the importance of all the variables as a dictionary?
Operating system used: Windows
Best Answer
-
AlexisD Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 1 Dataiker
Hello !
Those values are computed with shapley values. You can find more details about the process here.
The `globalExplanationsTopImportances` values are the 10 most important feature importance values.
I believe you are getting your `details` from `get_trained_model_snippet`. You can get the whole absolute feature importance dictionary using `get_trained_model_details("id").details["globalExplanationsAbsoluteImportance"]` instead. The top 10 values of that dictionnary should match the ones from the `globalExplanationsTopImportances` snippet.
Note that we don't necessarily compute the feature importances on all columns as this is a compute-heavy process. In most cases we compute a surrogate model (random forest regressor) and use the feature importances of this model to select the columns we will compute absolute feature importances on.
I hope that helps.