Classification error : Ended up with only one class in the test set. Cannot proceed

Options
taloh90
taloh90 Registered Posts: 6 ✭✭✭

Hello,

I encountered an error when I started training my classification models. The error is: Ended up with only one class in the test set. Cannot proceed

I have an unbalanced dataset and I would like to proceed with a classification to do comparative analysis. I have already split my dataset manually 70% for training and 30% for testing. In the design of my model, I chose them by putting a policy : Explicit extracs from two datasets.

If I understand correctly, it's a cross-validation error in the hyperparameters. I have to make a custom code to solve my problem like making Leave One Out? Or there is another solution? Because I have a very basic knowledge of python, I confess.

Tagged:

Best Answer

  • AlexandreL
    AlexandreL Dataiker Posts: 36 Dataiker
    Answer ✓
    Options

    Hi,

    The issue here seems to be that your dataset is so imbalanced that one of the folds contains only observations from one class. Could you please tell me what's your positive class proportion ? Before going to python code you can try the following options:

    - shuffle your dataset to make sure your positive observations are evenly distributed in the dataset (you can use a sort recipe, with a pre-computed random column, the formula is just rand(), then you can sort by this column)

    - use less folds for your cross validations

    Hoping this could solve your issue,

    Alex

Answers

  • taloh90
    taloh90 Registered Posts: 6 ✭✭✭
    Options

    Hi,

    Thanks for your suggested options, I did use less fold for my cross validation as you suggested and it works.

    In the training part of my dataset, the proportion of my positive class is 13 while for the negative class is 4. While for the test part, the proportion is 6 for the positive and 1 for the negative.

    So what I did in the design part of my classification model was to choose only the training part of my dataset and use a fold of 2. And I deployed the model with the best score in my flow, and then I ran a classification of my test dataset

Setup Info
    Tags
      Help me…