output of a random forest classification

omar21
omar21 Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 18 ✭✭✭✭

Hi i hope you doing well ,

i have a binary classification of two classe 0 and 2

and when i test my model on another dataset i get 3 columns in the output : the probability of being 0 (proba_0) , the probability of being 2 (proba_2) and the class (0 or 2) the logic is that if the proba_0 > 0,5 the algorithm must predict 0 else 2

but i don't get the same logic for the last row as the picture shows

kind regards

Answers

  • Nicolas_Servel
    Nicolas_Servel Dataiker Posts: 37 Dataiker

    Hello,

    when using visual ML in DSS for binary classification problem, an optimal threshold to decide which class is selected is computed, and not necessarily equal to 0.5:

    * The way it is computed is decided in the Design > Metrics tab, and by default, it is computed to optimize the F1 score

    * Then, in the report of the model, you can see the impact of the value of the threshold on the metrics in the "Confusion matrix" tab

    * When running a scoring/evaluation recipe, you can either keep the computed threshold of the model, or override it for this run in the "Threshold" section of the recipe

    If you wish to have a 0.5 threshold, you can change the settings of your recipe accordingly.

    Hope this helps,

    Best regards,

  • omar21
    omar21 Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 18 ✭✭✭✭

    HELLO ,

    so if i choose a threshold of 0,025 what is the probability that decide if the prediction is 0 or 1?

    best regards

  • Nicolas_Servel
    Nicolas_Servel Dataiker Posts: 37 Dataiker

    Hello,

    Then the limit probability is the threshold, i.e. 0.025 (or 2.5% in percentage). Above, it will be predicted 1, below, it will be predicted 0.

    Hope this helps,

    Best regards

Setup Info
    Tags
      Help me…