Skip to content

Academy Discussions

Sort by:

1 - 1 of 1

ML basics - Hands on: Tune the model
Hello, I am working on the hands-on exercise for tuning a model and i need to understand something.. why do we need to re-balance the model? where Dataiku DSS has already divided our samples into 80% …
Answered ✓
ml practitioner
ML BASICS
HANDS-ON
Started by ManarAlmutairi
Most recent by Sean
Aug 29, 2022
0
1
Solution by Sean
Hi @ManarAlmutairi
, this is a good question. In this case, Dataiku has done an 80/20 split from the first 100k rows. Are these 100k rows a good representation of the dataset? We probably don't want to make that assumption. Maybe the first 100k records do not have any high revenue customers!
This is a particularly acute problem when we know we have a class imbalance problem (we have many more "not high revenue" customers compared to high revenue customers). And this is the reality for many different ML use cases.
So when training the model, we want to try to make sure we have a more accurate representation of high revenue customers in the train and test sets.
Class imalance is an important ML concept (whether using Dataiku or not). You might want to look for external sources to supplement your understanding. This article might be one place to start.
Reply to Discussion

1 - 1 of 11

Trending Discussions

Can't access Dataiku Academy
Answered
1
Dataiku ML Ops Practitionerの受験資格が得られない
Answered
3
Unable to resolve 'Root Path does not exist', ERR_FSPROVIDER_ROOT_PATH_DOES_NOT_EXIST
Answered
1

Leaderboard

Member	Points
Turribeach	3688
tgb417	2513
Ignacio_Toledo	1082

Top