Train set for a model has more rows than actual dataset
UserBird
Dataiker, Alpha Tester Posts: 535 Dataiker
Hi,
I am training a classification model on a dataset which has 3612 rows. However, when i looked at the stats after training the model , the Train Set had 3752 rows and Test set had 896 rows which is much higher than the rows in original dataset. I wanted to know what could have caused this to happen. Can you please help me find the reason for this problem
I used the default settings for model in DSS.
Thank you
Sam
Tagged:
Answers
-
When i followed your steps, i got 171 column counts and 3612 record counts.
-
Could you:
* Retrain your model. In the pre-train modal, make sure to check the "recompute splits" checkbox
* If the problem persists, generate a diagnostic report (Administration > Maintenance > Diagnostic tool)
* Send it to support@dataiku.com (If the file is above 15 MB, you can use WeTransfer or a similar service)
Thanks -
Hi,
I could not find the "recompute splits" checkbox. Can you please point me where that checkbox is .
Sam -
Sorry, it's called "Drop existing sets, recompute new ones" - in the "Training models" modal that appears when you clikc on "Train"
-
I followed your steps and clicked "Drop existing sets, recompute new ones" but it still gives the same result where the train set has 3752 rows and test has 896 rows..
-
How do we generate diagnostic reports for the model we trained in DSS labs. I could not find any option in the page where it generates diagnostic reports for our model. Can you help me with this
Sam