ML_DIAGNOSTICS_DATASET_SANITY_CHECKS warning

kimerik
Level 2
ML_DIAGNOSTICS_DATASET_SANITY_CHECKS warning

Hi!

In some of our standalone evaluations we're getting the below warning. Does anyone know what's causing it and how to fix it? 

[WARN] [dku.warnings] - ML_DIAGNOSTICS_DATASET_SANITY_CHECKS: Prediction uses 1 of the 2 configured classes.

 

4 Replies
AlexT
Dataiker

Hi @kimerik ,
This is just a warning, meaning that all of the evaluations were from one of the classes. This may be expected depending on the data in evaluation datasets.
You will need to review and make sure your evaluation dataset contains both classes. 

0 Kudos
kimerik
Level 2
Author

Hi @AlexT ,

Sorry for not specifying this in my previous post, but in the cases where I've encountered this, the datasets has included both classes (see e.g. log below). But maybe the check is using sampled data, and one of the classes is rare enough for it not being included?

[07:51:09] [INFO] [dku.utils]  - [2024-03-04 07:51:09,418] [12/MainThread] [INFO] [dataiku.doctor.prediction.classification_scoring] Actual shape = (337468,)
[07:51:09] [INFO] [dku.utils]  - [2024-03-04 07:51:09,420] [12/MainThread] [INFO] [dataiku.doctor.prediction.classification_scoring] Class proba shape (1403,)

 

0 Kudos
AlexT
Dataiker

Indeed if the sample first 100k rows don't contain both classes, the warning can appear; you can try to re-balance the sample e.g random/class-rebalance to ensure both are visible to avoid the warning. If you are still seeing this after this please open a support ticket so we can investigate further.

0 Kudos
kimerik
Level 2
Author

Neither:
- performing a sorting step with the minority class on top, or

- sampling 100 k observations with class-rebalance worked.

So will create a support ticket. 

0 Kudos