How to get Variable Importance from Model

Options
lugow
lugow Partner, Registered Posts: 3 Partner
Hi,

I want to automatically consume the data on the "Variable Importance" area under the "Interpretation" section of a trained model. I see that I can manually export this data, but what I am trying to do is to get this data in a dataset or something similar where I can build pipelines on top of it.

Is that possible?

Best Answer

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    edited July 18 Answer ✓
    Options

    Hi,

    Please find below a piece of code to retrieve variable importance raw data for a trained model in an Analysis:


    import dataiku
    import pandas as pd

    client = dataiku.api_client()
    project = client.get_project(dataiku.default_project_key())
    analysis = project.get_analysis("rGRd5qWg")
    ml_task = analysis.get_ml_task("fm97lWNq")
    trained_model_ids = ml_task.get_trained_models_ids()

    trained_model_detail = ml_task.get_trained_model_details(trained_model_ids[0])
    feature_importance = trained_model_detail.get_raw().get("iperf").get("rawImportance")
    feature_importance_df = pd.DataFrame(feature_importance)

    Note that you can get the analysis_id, ml_task_id and trained_model_id from the URL of the page of the model. Or you can get that using our API: https://doc.dataiku.com/dss/latest/python-api/rest-api-client/ml.html#exploration-of-results

    Hope it helps,

    Alex

Answers

  • lugow
    lugow Partner, Registered Posts: 3 Partner
    Options
    This is awesome!!. Thanks!.
  • omri17
    omri17 Registered Posts: 4 ✭✭✭✭
    Options

    Hi Alex,

    This is really awesome.
    I saw you referred to the "get_raw()" method. I didn't saw any reference for it in the docs. Is it?
    How, for example, i can get regression coefficients in a similar approach?


    Thanks,

  • kkaminsky
    kkaminsky Registered Posts: 1 ✭✭✭
    Options

    I would also like to know the answer to this

  • ClemenceB
    ClemenceB Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Product Ideas Manager Posts: 18 Dataiker
    edited July 17
    Options

    Hi @kkaminsky
    ,

    "get_raw()" is a method that allows you to get all the trained model details as a python dict.

    To get regression coefficients you can use the following piece of code:

    import dataiku
    import pandas as pd
    
    client = dataiku.api_client()
    project = client.get_project(dataiku.default_project_key())
    analysis = project.get_analysis(analysis_id)
    ml_task = analysis.get_ml_task(ml_task_id)
    trained_model_ids = ml_task.get_trained_models_ids()
    
    trained_model_detail = ml_task.get_trained_model_details(trained_model_ids[1])
    regression_coefficients = trained_model_detail.get_raw().get('iperf').get('lmCoefficients')
    regression_coefficients = pd.DataFrame(regression_coefficients)


    If the regression_coefficients table is empty, make sure the model you're calling is a regression:

    trained_model_detail.get_raw().get('modeling').get('algorithm')



    Let me know if it helps.

    Clémence

Setup Info
    Tags
      Help me…