ML_DIAGNOSTICS_DATASET_SANITY_CHECKS warning

kimerik
kimerik Registered Posts: 3
edited July 16 in Using Dataiku

Hi!

In some of our standalone evaluations we're getting the below warning. Does anyone know what's causing it and how to fix it?

[WARN] [dku.warnings] - ML_DIAGNOSTICS_DATASET_SANITY_CHECKS: Prediction uses 1 of the 2 configured classes.

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,225 Dataiker

    Hi @kimerik
    ,
    This is just a warning, meaning that all of the evaluations were from one of the classes. This may be expected depending on the data in evaluation datasets.
    You will need to review and make sure your evaluation dataset contains both classes.

  • kimerik
    kimerik Registered Posts: 3
    edited July 17

    Hi @AlexT
    ,

    Sorry for not specifying this in my previous post, but in the cases where I've encountered this, the datasets has included both classes (see e.g. log below). But maybe the check is using sampled data, and one of the classes is rare enough for it not being included?

    [07:51:09] [INFO] [dku.utils]  - [2024-03-04 07:51:09,418] [12/MainThread] [INFO] [dataiku.doctor.prediction.classification_scoring] Actual shape = (337468,)
    [07:51:09] [INFO] [dku.utils]  - [2024-03-04 07:51:09,420] [12/MainThread] [INFO] [dataiku.doctor.prediction.classification_scoring] Class proba shape (1403,)

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,225 Dataiker

    Indeed if the sample first 100k rows don't contain both classes, the warning can appear; you can try to re-balance the sample e.g random/class-rebalance to ensure both are visible to avoid the warning. If you are still seeing this after this please open a support ticket so we can investigate further.

  • kimerik
    kimerik Registered Posts: 3

    Neither:
    - performing a sorting step with the minority class on top, or

    - sampling 100 k observations with class-rebalance worked.

    So will create a support ticket.

Setup Info
    Tags
      Help me…