Discover the winners & finalists of the 2022 Dataiku Frontrunner Awards!READ THEIR USE CASES

output of a random forest classification

omar21
Level 1
output of a random forest classification

Hi i hope you doing well ,

i have a binary classification of two classe 0 and 2 

and when i test my model on another dataset i get 3 columns in the output  : the probability of being 0 (proba_0) , the probability of being 2 (proba_2) and the class (0 or 2) the logic is that if the proba_0 > 0,5 the algorithm must predict 0 else 2 

but i don't get the same logic for the last row as the picture  shows 

kind regards 

0 Kudos
3 Replies
Nicolas_Servel
Dataiker

Hello,

when using visual ML in DSS for binary classification problem, an optimal threshold to decide which class is selected is computed, and not necessarily equal to 0.5:

* The way it is computed is decided in the Design > Metrics tab, and by default, it is computed to optimize the F1 score

* Then, in the report of the model, you can see the impact of the value of the threshold on the metrics in the "Confusion matrix" tab

* When running a scoring/evaluation recipe, you can either keep the computed threshold of the model, or override it for this run in the "Threshold" section of the recipe

 

If you wish to have a 0.5 threshold, you can change the settings of your recipe accordingly.

 

Hope this helps,

Best regards,

0 Kudos
omar21
Level 1
Author

HELLO , 

so if i choose a threshold of 0,025 what is the probability that decide if the prediction is 0 or 1? 

 

best regards 

0 Kudos
Nicolas_Servel
Dataiker

Hello,

Then the limit probability is the threshold, i.e. 0.025 (or 2.5% in percentage). Above, it will be predicted 1, below, it will be predicted 0.

 

Hope this helps,

Best regards

0 Kudos