About Shapley calculations

gnaldi62
gnaldi62 Partner, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 79 Neuron

Hi,

we're facing big problems with Shapley calculation with a customer. I've found some useful documentation at https://doc.dataiku.com/dss/latest/machine-learning/supervised/explanations.html and a few messages in the community (https://community.dataiku.com/t5/Using-Dataiku/SHAP-Shapley-values-in-Dataiku/m-p/22241, https://community.dataiku.com/t5/Using-Dataiku/Interpretation-of-Shapley-values-in-Dataiku/m-p/7233, https://community.dataiku.com/t5/General-Discussion/Individual-Explanations/m-p/15378).

Because of big performance issue (calculation is taking more than 16 hours on the local DSS), we're trying to figure out another way to workaround this. The PDF referred to in the documentation is saying that:

"If the model does not provide feature importances, they are computed by training a random
forest surrogate model and using its feature importances" and this is our main problem with trying to reproduce the same Shapley. Which kind of model is used in this case ? Is there any further reference ?

Thanks. Rgds.

Giuseppe

Best Answer

  • AdrienL
    AdrienL Dataiker, Alpha Tester Posts: 196 Dataiker
    Answer ✓

    Hi,

    If the model does not expose feature importance from which to compute the most impactful columns to take into account for computing Shapley value estimation, DSS makes a surrogate model using a random forrest regressor (100 trees, max depth 5, subsample of max 1000 rows) and uses the feature importance of this surrogate model.

Answers

  • gnaldi62
    gnaldi62 Partner, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 79 Neuron

    Hi Adrien,

    thanks for your quick response. Rgds.

    Giuseppe

Setup Info
    Tags
      Help me…