Looking at this post for guidance:

led me to this documentation:

where there is documented a function called: compute_shapley_feature_importance()

When I look at my model (xgboost binary classifier), I see an option for Shapley as well as Gini importance. Because my model is only given a single variable (a numeric vector of length 3501) the Shapley importance always says the array is 100% important (thanks Captain Obvious), but the Gini importance actually shows me the importance of the various element numbers in my vector.

I would like to access the Gini importance via the API so I can visualize this data (I want to graph the vector, then use the Gini importance to highlight the important parts of the vector with a vertical reference line). Sadly there is no documentation that I can find that explains how to access the Gini importance. This request is further complicated by the fact that my model is partitioned, so I actually want to access each partition's variable importance.

I've googled it, and come up empty handed.

Can anybody lend a hand and point me to some documentation?



Operating system used: Red Hat

Best Answer

  • Alexandru
    Hi @Jason
    Your probably comes from the fact that the model is partitioned:

    Are you able to retrieve the feature importance with something like this on a non-partitioned model?

    import dataiku
    import pandas as pd
    client = dataiku.api_client()
    project = client.get_project(dataiku.default_project_key())
    analysis = project.get_analysis(analysis_id)
    ml_task = analysis.get_ml_task(ml_task_id)
    #trained_model_ids = ml_task.get_trained_models_ids()
    trained_model_detail = ml_task.get_trained_model_details(trained_model_id)
    feature_importance = trained_model_detail.get_raw()
    if 'iperf' in feature_importance.keys():
        raw_importance = feature_importance.get("iperf").get("rawImportance")
        raw_importance = feature_importance.get("perf").get("variables_importance")
    feature_importance_df = pd.DataFrame(raw_importance)



  • Jason
    I have successfully retrieved them from a non-partitioned model in the past.... I have not tried with this set of models, nor since I've upgraded to version 12. Here's to hoping this is added to the API in the future. Thanks!

