Submit your inspiring success story or innovative use case to the 2022 Dataiku Frontrunner Awards! ENTER YOUR SUBMISSION

output of a random forest classification

omar21
Level 1
output of a random forest classification

Hi i hope you doing well ,

i have a binary classification of two classe 0 and 2 

and when i test my model on another dataset i get 3 columns in the output  : the probability of being 0 (proba_0) , the probability of being 2 (proba_2) and the class (0 or 2) the logic is that if the proba_0 > 0,5 the algorithm must predict 0 else 2 

but i don't get the same logic for the last row as the picture  shows 

kind regards 

0 Kudos
3 Replies
Nicolas_Servel
Dataiker
Dataiker

Hello,

when using visual ML in DSS for binary classification problem, an optimal threshold to decide which class is selected is computed, and not necessarily equal to 0.5:

* The way it is computed is decided in the Design > Metrics tab, and by default, it is computed to optimize the F1 score

* Then, in the report of the model, you can see the impact of the value of the threshold on the metrics in the "Confusion matrix" tab

* When running a scoring/evaluation recipe, you can either keep the computed threshold of the model, or override it for this run in the "Threshold" section of the recipe

 

If you wish to have a 0.5 threshold, you can change the settings of your recipe accordingly.

 

Hope this helps,

Best regards,

0 Kudos
omar21
Level 1
Author

HELLO , 

so if i choose a threshold of 0,025 what is the probability that decide if the prediction is 0 or 1? 

 

best regards 

0 Kudos
Nicolas_Servel
Dataiker
Dataiker

Hello,

Then the limit probability is the threshold, i.e. 0.025 (or 2.5% in percentage). Above, it will be predicted 1, below, it will be predicted 0.

 

Hope this helps,

Best regards

0 Kudos