Getting Gini variable importance via API?

Solved!
Jason
Level 4
Getting Gini variable importance via API?

Looking at this post for guidance: https://community.dataiku.com/t5/Using-Dataiku/How-to-get-Variable-Importance-from-Model/m-p/3589

led me to this documentation: https://developer.dataiku.com/latest/api-reference/python/ml.html#exploration-of-results

where there is documented a function called: compute_shapley_feature_importance()

When I look at my model (xgboost binary classifier), I see an option for Shapley as well as Gini importance.  Because my model is only given a single variable (a numeric vector of length 3501) the Shapley importance always says the array is 100% important (thanks Captain Obvious), but the Gini importance actually shows me the importance of the various element numbers in my vector.

I would like to access the Gini importance via the API so I can visualize this data (I want to graph the vector, then use the Gini importance to highlight the important parts of the vector with a vertical reference line).  Sadly there is no documentation that I can find that explains how to access the Gini importance.  This request is further complicated by the fact that my model is partitioned, so I actually want to access each partition's variable importance.

I've googled it, and come up empty handed.

Can anybody lend a hand and point me to some documentation?

 

Thanks,

-Jason


Operating system used: Red Hat

0 Kudos
1 Solution
AlexT
Dataiker

Hi @Jason ,
Your probably comes from the fact that the model is partitioned:

https://community.dataiku.com/t5/Using-Dataiku/Retrieve-feature-importance-from-partioned-model/m-p/...


Are you able to retrieve the feature importance with something like this on a non-partitioned model? 

import dataiku
import pandas as pd

analysis_id="r1111"
ml_task_id='q1111'
trained_model_id='A-PROJECT-KEY-rYUdqksI-qZZ8xzfM-s5-pp1-m2'

client = dataiku.api_client()
project = client.get_project(dataiku.default_project_key())
analysis = project.get_analysis(analysis_id)
ml_task = analysis.get_ml_task(ml_task_id)
#trained_model_ids = ml_task.get_trained_models_ids()

trained_model_detail = ml_task.get_trained_model_details(trained_model_id)

feature_importance = trained_model_detail.get_raw()
if 'iperf' in feature_importance.keys():
    raw_importance = feature_importance.get("iperf").get("rawImportance")
else:
    raw_importance = feature_importance.get("perf").get("variables_importance")

feature_importance_df = pd.DataFrame(raw_importance)


Thanks

View solution in original post

0 Kudos
2 Replies
AlexT
Dataiker

Hi @Jason ,
Your probably comes from the fact that the model is partitioned:

https://community.dataiku.com/t5/Using-Dataiku/Retrieve-feature-importance-from-partioned-model/m-p/...


Are you able to retrieve the feature importance with something like this on a non-partitioned model? 

import dataiku
import pandas as pd

analysis_id="r1111"
ml_task_id='q1111'
trained_model_id='A-PROJECT-KEY-rYUdqksI-qZZ8xzfM-s5-pp1-m2'

client = dataiku.api_client()
project = client.get_project(dataiku.default_project_key())
analysis = project.get_analysis(analysis_id)
ml_task = analysis.get_ml_task(ml_task_id)
#trained_model_ids = ml_task.get_trained_models_ids()

trained_model_detail = ml_task.get_trained_model_details(trained_model_id)

feature_importance = trained_model_detail.get_raw()
if 'iperf' in feature_importance.keys():
    raw_importance = feature_importance.get("iperf").get("rawImportance")
else:
    raw_importance = feature_importance.get("perf").get("variables_importance")

feature_importance_df = pd.DataFrame(raw_importance)


Thanks

0 Kudos
Jason
Level 4
Author

I have successfully retrieved them from a non-partitioned model in the past.... I have not tried with this set of models, nor since I've upgraded to version 12.  Here's to hoping this is added to the API in the future.  Thanks!

0 Kudos

Labels

?
Labels (2)
A banner prompting to get Dataiku