Extracting variable importance from model recipe through code recipe
Hi, I wish to automate extraction of variable importance as a data frame from a model and use it for further processes using python recipe.
However, when I'm using this method, https://community.dataiku.com/t5/Using-Dataiku/How-to-get-Variable-Importance-from-Model/td-p/3589 [trained_model_detail.get_raw().get('perf').get('variables_importance')]. I get different importance scores than what's visible in the model.
Also dataiku.Model("hSO3BRlk").list_versions() show only top 10 variables as per importance scores but I need more than 10.
Also, I am running only one model, so there is not possibility of incorrect model choosing.
I wish to move forward with the below pre-written codes in python recipe from deployed model
model_1 = dataiku.model('hSO3BRlk')
pred_1 = model_1.get_predictor()
Kindly help.
Answers
-
Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
Hi @shreyanshv6
,
Can you check what you get if you print out trained_model_ids in your code? For example:I want to make sure that you are referring to the correct ID just in case you do have more than 1 returned
I also found on DSS 11 that the following code works, which I would suggest trying first:client = dataiku.api_client() project = client.get_default_project() # make sure you are pointing to your analysis ID here analysis = project.get_analysis('IEyUbkB7') mltask = analysis.get_ml_task('S74zMovS') trained_model_ids = mltask.get_trained_models_ids() print(trained_model_ids) # here, i'm pointing to my first trained_model_id, but this may vary prediction_results = mltask.get_trained_model_details(trained_model_ids[0]) values = prediction_results.get_raw()['iperf']['rawImportance'] for i in range(0, len(prediction_results.get_raw()['iperf']['rawImportance']['variables'])): print(values['variables'][i], values['importances'][i])
If the results still don't look right to you, can you please attach a screenshot of your full variable importance screen in visual analysis, including the URL which will contain the analysis ID.
Then please paste your trained_model_ids results and the results you get when printing out the importance values that are different from what you expect.Thanks,
Sarina