Survey banner
Share your feedback on the Dataiku documentation with this 5 min survey. Thanks! TAKE THE SURVEY

Extracting variable importance from model recipe through code recipe

shreyanshv6
Level 1
Extracting variable importance from model recipe through code recipe

Hi, I wish to automate extraction of variable importance as a data frame from a model and use it for further processes using python recipe.

However, when I'm using this method, https://community.dataiku.com/t5/Using-Dataiku/How-to-get-Variable-Importance-from-Model/td-p/3589 [trained_model_detail.get_raw().get('perf').get('variables_importance')]. I get different importance scores than what's visible in the model.

Also dataiku.Model("hSO3BRlk").list_versions() show only top 10 variables as per importance scores but I need more than 10.

Also, I am running only one model, so there is not possibility of incorrect model choosing.

I wish to move forward with the below pre-written codes in python recipe from deployed model


model_1 = dataiku.model('hSO3BRlk')

pred_1 = model_1.get_predictor()

 


Kindly help.

0 Kudos
1 Reply
SarinaS
Dataiker

Hi @shreyanshv6,

Can you check what you get if you print out trained_model_ids in your code? For example:

Screen Shot 2023-01-11 at 6.10.21 PM.png

I want to make sure that you are referring to the correct ID just in case you do have more than 1 returned

I also found on DSS 11 that the following code works, which I would suggest trying first:

client = dataiku.api_client()
project = client.get_default_project()

# make sure you are pointing to your analysis ID here
analysis = project.get_analysis('IEyUbkB7')

mltask = analysis.get_ml_task('S74zMovS')
trained_model_ids = mltask.get_trained_models_ids()
print(trained_model_ids)

# here, i'm pointing to my first trained_model_id, but this may vary 
prediction_results = mltask.get_trained_model_details(trained_model_ids[0])
values = prediction_results.get_raw()['iperf']['rawImportance']
for i in range(0, len(prediction_results.get_raw()['iperf']['rawImportance']['variables'])):
    print(values['variables'][i], values['importances'][i])


If the results still don't look right to you, can you please attach a screenshot of your full variable importance screen in visual analysis, including the URL which will contain the analysis ID. 

Then please paste your trained_model_ids results and the results you get when printing out the importance values that are different from what you expect. 

Thanks,
Sarina

0 Kudos