ML_DIAGNOSTICS_DATASET_SANITY_CHECKS warning

kimerik · ‎02-29-2024

Hi!

In some of our standalone evaluations we're getting the below warning. Does anyone know what's causing it and how to fix it?

[WARN] [dku.warnings] - ML_DIAGNOSTICS_DATASET_SANITY_CHECKS: Prediction uses 1 of the 2 configured classes.

AlexT · ‎03-03-2024

Hi @kimerik ,
This is just a warning, meaning that all of the evaluations were from one of the classes. This may be expected depending on the data in evaluation datasets.
You will need to review and make sure your evaluation dataset contains both classes.

kimerik · ‎03-04-2024

Hi @AlexT ,

Sorry for not specifying this in my previous post, but in the cases where I've encountered this, the datasets has included both classes (see e.g. log below). But maybe the check is using sampled data, and one of the classes is rare enough for it not being included?

[07:51:09] [INFO] [dku.utils]  - [2024-03-04 07:51:09,418] [12/MainThread] [INFO] [dataiku.doctor.prediction.classification_scoring] Actual shape = (337468,)
[07:51:09] [INFO] [dku.utils]  - [2024-03-04 07:51:09,420] [12/MainThread] [INFO] [dataiku.doctor.prediction.classification_scoring] Class proba shape (1403,)

AlexT · ‎03-04-2024

Indeed if the sample first 100k rows don't contain both classes, the warning can appear; you can try to re-balance the sample e.g random/class-rebalance to ensure both are visible to avoid the warning. If you are still seeing this after this please open a support ticket so we can investigate further.

kimerik · ‎03-07-2024

Neither:
- performing a sorting step with the minority class on top, or

- sampling 100 k observations with class-rebalance worked.

So will create a support ticket.

Sign up to take part

ML_DIAGNOSTICS_DATASET_SANITY_CHECKS warning

ML_DIAGNOSTICS_DATASET_SANITY_CHECKS warning