output of a random forest classification
Hi i hope you doing well ,
i have a binary classification of two classe 0 and 2
and when i test my model on another dataset i get 3 columns in the output : the probability of being 0 (proba_0) , the probability of being 2 (proba_2) and the class (0 or 2) the logic is that if the proba_0 > 0,5 the algorithm must predict 0 else 2
but i don't get the same logic for the last row as the picture shows
kind regards
Answers
-
Hello,
when using visual ML in DSS for binary classification problem, an optimal threshold to decide which class is selected is computed, and not necessarily equal to 0.5:
* The way it is computed is decided in the Design > Metrics tab, and by default, it is computed to optimize the F1 score
* Then, in the report of the model, you can see the impact of the value of the threshold on the metrics in the "Confusion matrix" tab
* When running a scoring/evaluation recipe, you can either keep the computed threshold of the model, or override it for this run in the "Threshold" section of the recipe
If you wish to have a 0.5 threshold, you can change the settings of your recipe accordingly.
Hope this helps,
Best regards,
-
HELLO ,
so if i choose a threshold of 0,025 what is the probability that decide if the prediction is 0 or 1?
best regards
-
Hello,
Then the limit probability is the threshold, i.e. 0.025 (or 2.5% in percentage). Above, it will be predicted 1, below, it will be predicted 0.
Hope this helps,
Best regards