Announcing the winners & finalists of the Dataiku Frontrunner Awards 2021! Read their inspiring stories

Decision Tree Interpretation

jm596353
Level 1
Decision Tree Interpretation

I created a Decision Tree predictive model, and I was wondering if you could help me understand the difference between the %'s in Probabilities and Target Classes when I view the decision tree itself.  What do each of these %'s represent?  Below is a screen shot.  

 

jm596353_0-1615490324263.jpeg

 

3 Replies
KimmyC
Dataiker
Dataiker

Hi,

Probabilities are the probabilities of each class as predicted by tree, whereas target classes is the distribution of data in the training set corresponding to the given tree node.

Hope this helps!

Kim

0 Kudos
pvannies
Level 2
Level 2

Hi @KimmyC 

Could you detail the explanation a bit more?

To get a probability of a class I would have to give an input X that contains values for the features (I assume you use a predict_proba(X) method on a DecisionTreeClassifier from sklearn under the hood). When you are on a tree node, which values/inputs are used to calculate this class probability? Are these all the datapoints that the node contains and its averaged prediction probabilities, or some averaged values for X based on the datapoints that gives one prediction class probability?


Jean-Yves
Dataiker
Dataiker

Hi, 

So the probabilities that you see under TARGET CLASSES are derived from the proportion of samples in the node that belong to each class.

The probabilities under PROBABILITIES are what the model would predict if the node was final (i.e, a leaf). All the observations falling into that node would receive the same probability prediction so there is no need to take any average.

I hope this helps!

Best, 

Regards

Jean-Yves


0 Kudos
A banner prompting to get Dataiku DSS