Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

What is globalExplanationsTopImportances ? How is it calculated?

Solved!
MNOP
Level 3
What is globalExplanationsTopImportances ? How is it calculated?

I'm using the API to retrieve the following information for an regression model. 

 

details.get_performance_metrics()['globalExplanationsTopImportances']

 

This returns a dictionary list of dictionaries with keys and "s" and "d" as follows 

 

[{s:"Feature1",d:0.25},{s:"Feature2",d:0.15}]

 

What value is given as d? What are the criteria for selecting the top features here (I see a varying number of features in this list)?
Is there any way to get the importance of all the variables as a dictionary?


Operating system used: Windows

0 Kudos
1 Solution
AlexisD
Dataiker

Hello !

Those values are computed with shapley values. You can find more details about the process here. 

The `globalExplanationsTopImportances` values are the 10 most important feature importance values.

I believe you are getting your `details` from `get_trained_model_snippet`. You can get the whole absolute feature importance dictionary using `get_trained_model_details("id").details["globalExplanationsAbsoluteImportance"]` instead. The top 10 values of that dictionnary should match the ones from the `globalExplanationsTopImportances` snippet.

Note that we don't necessarily compute the feature importances on all columns as this is a compute-heavy process. In most cases we compute a surrogate model (random forest regressor) and use the feature importances of this model to select the columns we will compute absolute feature importances on.

I hope that helps.

View solution in original post

1 Reply
AlexisD
Dataiker

Hello !

Those values are computed with shapley values. You can find more details about the process here. 

The `globalExplanationsTopImportances` values are the 10 most important feature importance values.

I believe you are getting your `details` from `get_trained_model_snippet`. You can get the whole absolute feature importance dictionary using `get_trained_model_details("id").details["globalExplanationsAbsoluteImportance"]` instead. The top 10 values of that dictionnary should match the ones from the `globalExplanationsTopImportances` snippet.

Note that we don't necessarily compute the feature importances on all columns as this is a compute-heavy process. In most cases we compute a surrogate model (random forest regressor) and use the feature importances of this model to select the columns we will compute absolute feature importances on.

I hope that helps.