Class imbalance and test and train accuracy scores in Dataiku
Hi All,
1)How do I obtain test set accuracy in Dataiku? I want to make sure that I am not over fitting and my train set is accuracy is close to test set accuracy. But currently, Idon't know how to see the test set accuracy
2)What are the detailed steps to address class imbalance ? Can someone show with screenshots?
Thank you so much!
S
Answers
-
JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 297 Dataiker
Hi @sir
,During the training phase, DSS “holds out” on the test set, and the model is only trained on the train set. This ensures that the evaluation is done on data that the model has “never seen before”. To get test set results, you will need to run an evaluation recipe:
An Evaluation recipe takes as inputs:
an evaluation dataset
a model
An Evaluation Recipe can have up to three outputs:
an Evaluation Store, containing the main Model Evaluation and all associated result screens
an output dataset, containing the input features, prediction and correctness of prediction for each record
a metrics dataset, containing just the performance metrics for this evaluation (i.e. it’s a subset of the Evaluation Store)
The metrics dataset will provide results that you can compare to the train set results.
In regards to addressing class imbalance, please refer to the following community post: https://community.dataiku.com/t5/Using-Dataiku/How-to-train-a-classification-model-on-an-imbalanced-dataset/m-p/6119
Please let us know if you have any questions.
Thanks!
Jordan