Extracting variable importance from model recipe through code recipe

shreyanshv6
shreyanshv6 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 6

Hi, I wish to automate extraction of variable importance as a data frame from a model and use it for further processes using python recipe.

However, when I'm using this method, https://community.dataiku.com/t5/Using-Dataiku/How-to-get-Variable-Importance-from-Model/td-p/3589 [trained_model_detail.get_raw().get('perf').get('variables_importance')]. I get different importance scores than what's visible in the model.

Also dataiku.Model("hSO3BRlk").list_versions() show only top 10 variables as per importance scores but I need more than 10.

Also, I am running only one model, so there is not possibility of incorrect model choosing.

I wish to move forward with the below pre-written codes in python recipe from deployed model


model_1 = dataiku.model('hSO3BRlk')

pred_1 = model_1.get_predictor()


Kindly help.

Answers

  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
    edited July 2024

    Hi @shreyanshv6
    ,

    Can you check what you get if you print out trained_model_ids in your code? For example:

    Screen Shot 2023-01-11 at 6.10.21 PM.png

    I want to make sure that you are referring to the correct ID just in case you do have more than 1 returned

    I also found on DSS 11 that the following code works, which I would suggest trying first:

    client = dataiku.api_client()
    project = client.get_default_project()
    
    # make sure you are pointing to your analysis ID here
    analysis = project.get_analysis('IEyUbkB7')
    
    mltask = analysis.get_ml_task('S74zMovS')
    trained_model_ids = mltask.get_trained_models_ids()
    print(trained_model_ids)
    
    # here, i'm pointing to my first trained_model_id, but this may vary 
    prediction_results = mltask.get_trained_model_details(trained_model_ids[0])
    values = prediction_results.get_raw()['iperf']['rawImportance']
    for i in range(0, len(prediction_results.get_raw()['iperf']['rawImportance']['variables'])):
        print(values['variables'][i], values['importances'][i])


    If the results still don't look right to you, can you please attach a screenshot of your full variable importance screen in visual analysis, including the URL which will contain the analysis ID.

    Then please paste your trained_model_ids results and the results you get when printing out the importance values that are different from what you expect.

    Thanks,
    Sarina

Setup Info
    Tags
      Help me…