Academy Discussions

Sort by:
1 - 1 of 1
  • Hello, I am working on the hands-on exercise for tuning a model and i need to understand something.. why do we need to re-balance the model? where Dataiku DSS has already divided our samples into 80% …
    Answered ✓
    Started by ManarAlmutairi
    Most recent by Sean
    0
    1
    Sean
    Solution by Sean

    Hi @ManarAlmutairi
    , this is a good question. In this case, Dataiku has done an 80/20 split from the first 100k rows. Are these 100k rows a good representation of the dataset? We probably don't want to make that assumption. Maybe the first 100k records do not have any high revenue customers!

    This is a particularly acute problem when we know we have a class imbalance problem (we have many more "not high revenue" customers compared to high revenue customers). And this is the reality for many different ML use cases.

    So when training the model, we want to try to make sure we have a more accurate representation of high revenue customers in the train and test sets.

    Class imalance is an important ML concept (whether using Dataiku or not). You might want to look for external sources to supplement your understanding. This article might be one place to start.

1 - 1 of 11