ML_DIAGNOSTICS_DATASET_SANITY_CHECKS warning
Hi!
In some of our standalone evaluations we're getting the below warning. Does anyone know what's causing it and how to fix it?
[WARN] [dku.warnings] - ML_DIAGNOSTICS_DATASET_SANITY_CHECKS: Prediction uses 1 of the 2 configured classes.
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi @kimerik
,
This is just a warning, meaning that all of the evaluations were from one of the classes. This may be expected depending on the data in evaluation datasets.
You will need to review and make sure your evaluation dataset contains both classes. -
Hi @AlexT
,Sorry for not specifying this in my previous post, but in the cases where I've encountered this, the datasets has included both classes (see e.g. log below). But maybe the check is using sampled data, and one of the classes is rare enough for it not being included?
[07:51:09] [INFO] [dku.utils] - [2024-03-04 07:51:09,418] [12/MainThread] [INFO] [dataiku.doctor.prediction.classification_scoring] Actual shape = (337468,) [07:51:09] [INFO] [dku.utils] - [2024-03-04 07:51:09,420] [12/MainThread] [INFO] [dataiku.doctor.prediction.classification_scoring] Class proba shape (1403,)
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Indeed if the sample first 100k rows don't contain both classes, the warning can appear; you can try to re-balance the sample e.g random/class-rebalance to ensure both are visible to avoid the warning. If you are still seeing this after this please open a support ticket so we can investigate further.
-
Neither:
- performing a sorting step with the minority class on top, or- sampling 100 k observations with class-rebalance worked.
So will create a support ticket.