What is globalExplanationsTopImportances ? How is it calculated?

Mohammed
Mohammed Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 43 ✭✭✭
edited July 16 in Using Dataiku

I'm using the API to retrieve the following information for an regression model.

details.get_performance_metrics()['globalExplanationsTopImportances']

This returns a dictionary list of dictionaries with keys and "s" and "d" as follows

[{s:"Feature1",d:0.25},{s:"Feature2",d:0.15}]

What value is given as d? What are the criteria for selecting the top features here (I see a varying number of features in this list)?
Is there any way to get the importance of all the variables as a dictionary?


Operating system used: Windows

Best Answer

  • AlexisD
    AlexisD Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 1 Dataiker
    Answer ✓

    Hello !

    Those values are computed with shapley values. You can find more details about the process here.

    The `globalExplanationsTopImportances` values are the 10 most important feature importance values.

    I believe you are getting your `details` from `get_trained_model_snippet`. You can get the whole absolute feature importance dictionary using `get_trained_model_details("id").details["globalExplanationsAbsoluteImportance"]` instead. The top 10 values of that dictionnary should match the ones from the `globalExplanationsTopImportances` snippet.

    Note that we don't necessarily compute the feature importances on all columns as this is a compute-heavy process. In most cases we compute a surrogate model (random forest regressor) and use the feature importances of this model to select the columns we will compute absolute feature importances on.

    I hope that helps.

Setup Info
    Tags
      Help me…