How does the evaluation store threshold actually work?

Jason · February 11

In the documentation for the evaluation store, when doing a two-class (binary) classification, there is a slider for the threshold used. The documentation for this threshold reads in part:

When doing binary classification, most models don’t output a single binary answer, but instead a continuous “score of being positive”. You then need to select a threshold on this score, above which DSS will consider the sample as positive. This threshold for scoring the target class is optimized according to the selected metric

I read that to mean when the predicted value is above the threshold, it is considered positive.

However, when I move the slider to the right:

Screenshot 2025-02-11 083921.png

Screenshot 2025-02-11 083943.png

I get MORE predictions called positive. These two things seem to contradict. Why would moving the threshold UP cause more positives? The reason I ask, is that in practice, I have python code that is discovering the cutoff that will be used in production… but I would like to manually put in that cutoff number (in the evaluation store recipe's configuration, I can specify the cutoff) which should make the outcomes of the evaluation store work the same as the implementation code I am using… However, they currently do not seem to give the same results.

My expectation is that moving the threshold up would result in less positive calls, not more.

Operating system used: Windows 11

Jason · February 11

I guess I'm answering my own question here. After posting the original question, I realized the answer is simple:

In my case, I have a custom python model, and the evaluation stores are built against the output of that custom model. When I set up the evaluation store, I defined the probabilities of the true and false classes. In my case, I had been defining the probability of True followed by False. Because I wanted the calibration curves, I was ticking the box for probability aware, then providing the true probability for the true class and (1-true probability) for the false class. Other items on this configuration page are defined with true first, then false… so I just followed the pattern. Turns out, if I define False first, then define True, it fixes the directionality of the slider.

As an aside, any Dataiku people here… can we please add an option for a binary classifier here that would set all this up automatically based on the probability of the True class without having to configure the rest?

How does the evaluation store threshold actually work?

Setup Info

Best Answer

Welcome!

Welcome!

Quick Links

Categories

Sign up to take part

How does the evaluation store threshold actually work?

Setup Info

Best Answer

Welcome!

Welcome!

Quick Links

Categories