How does the evaluation store threshold actually work?

Jason
Jason Registered Posts: 35 ✭✭✭✭✭

In the documentation for the evaluation store, when doing a two-class (binary) classification, there is a slider for the threshold used. The documentation for this threshold reads in part:

When doing binary classification, most models don’t output a single binary answer, but instead a continuous “score of being positive”. You then need to select a threshold on this score, above which DSS will consider the sample as positive. This threshold for scoring the target class is optimized according to the selected metric

I read that to mean when the predicted value is above the threshold, it is considered positive.

However, when I move the slider to the right:

I get MORE predictions called positive. These two things seem to contradict. Why would moving the threshold UP cause more positives? The reason I ask, is that in practice, I have python code that is discovering the cutoff that will be used in production… but I would like to manually put in that cutoff number (in the evaluation store recipe's configuration, I can specify the cutoff) which should make the outcomes of the evaluation store work the same as the implementation code I am using… However, they currently do not seem to give the same results.

My expectation is that moving the threshold up would result in less positive calls, not more.

Operating system used: Windows 11

Best Answer

  • Jason
    Jason Registered Posts: 35 ✭✭✭✭✭
    Answer ✓

    I guess I'm answering my own question here. After posting the original question, I realized the answer is simple:

    In my case, I have a custom python model, and the evaluation stores are built against the output of that custom model. When I set up the evaluation store, I defined the probabilities of the true and false classes. In my case, I had been defining the probability of True followed by False. Because I wanted the calibration curves, I was ticking the box for probability aware, then providing the true probability for the true class and (1-true probability) for the false class. Other items on this configuration page are defined with true first, then false… so I just followed the pattern. Turns out, if I define False first, then define True, it fixes the directionality of the slider.

    As an aside, any Dataiku people here… can we please add an option for a binary classifier here that would set all this up automatically based on the probability of the True class without having to configure the rest?

Setup Info
    Tags
      Help me…