Decision Tree Interpretation

jm596353
jm596353 Registered Posts: 1 ✭✭✭

I created a Decision Tree predictive model, and I was wondering if you could help me understand the difference between the %'s in Probabilities and Target Classes when I view the decision tree itself. What do each of these %'s represent? Below is a screen shot.

jm596353_0-1615490324263.jpeg

Answers

  • KimmyC
    KimmyC Dataiker Posts: 34 Dataiker

    Hi,

    Probabilities are the probabilities of each class as predicted by tree, whereas target classes is the distribution of data in the training set corresponding to the given tree node.

    Hope this helps!

    Kim

  • pvannies
    pvannies Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 16 Neuron

    Hi @KimmyC


    Could you detail the explanation a bit more?

    To get a probability of a class I would have to give an input X that contains values for the features (I assume you use a predict_proba(X) method on a DecisionTreeClassifier from sklearn under the hood). When you are on a tree node, which values/inputs are used to calculate this class probability? Are these all the datapoints that the node contains and its averaged prediction probabilities, or some averaged values for X based on the datapoints that gives one prediction class probability?


  • Jean-Yves
    Jean-Yves Dataiker Posts: 14 Dataiker

    Hi,

    So the probabilities that you see under TARGET CLASSES are derived from the proportion of samples in the node that belong to each class.

    The probabilities under PROBABILITIES are what the model would predict if the node was final (i.e, a leaf). All the observations falling into that node would receive the same probability prediction so there is no need to take any average.

    I hope this helps!

    Best,

    Regards

    Jean-Yves


  • Stephaniav
    Stephaniav Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 3 Partner

    Thank you for the explanations about the interpretation of the tree itself.

    But in the case of the random forest model, where it considers several trees for the final decision. Why does the DSS show only 2 trees? Are they just examples of trees used?

  • Jean-Yves
    Jean-Yves Dataiker Posts: 14 Dataiker

    Hello!

    Thank you for your question! DSS will only show a limited number of trees when the number of nodes in the trees is too high. Note that it's not always 2.

    Jean-Yves

  • Masterergy
    Masterergy Registered Posts: 2

    I realize this thread is a couple of years old, but in case you're still wondering or for anyone else who stumbles across this, let's chat about Decision Tree probabilities and Target Classes. The percentages in Probabilities typically indicate the likelihood of each class being the true class for a given leaf node.
    This concept of probability kinda reminds me of making a d20 roll in tabletop games. Each face of a d20 die has a 5% chance of landing face up, right? But what you actually do with that roll—whether it's a hit, a miss, or a critical—depends on the 'class' or situation you're in.

Setup Info
    Tags
      Help me…