Train set for a model has more rows than actual dataset

UserBird
Dataiker
Train set for a model has more rows than actual dataset

Hi, 



I am training a classification model on a dataset which has 3612 rows. However, when i looked at the stats after training the model , the  Train Set had 3752 rows and Test set had 896 rows which is much higher than the rows in original dataset. I wanted to know what could have caused this to happen. Can you please help me find the reason for this problem



 





I used the default settings for model in DSS. 





Thank you



Sam

0 Kudos
6 Replies
Sam648
Level 2

When i followed your steps, i got 171 column counts and 3612 record counts.



0 Kudos
Clรฉment_Stenac
Could you:
* Retrain your model. In the pre-train modal, make sure to check the "recompute splits" checkbox
* If the problem persists, generate a diagnostic report (Administration > Maintenance > Diagnostic tool)
* Send it to support@dataiku.com (If the file is above 15 MB, you can use WeTransfer or a similar service)

Thanks
0 Kudos
Sam648
Level 2
Hi,
I could not find the "recompute splits" checkbox. Can you please point me where that checkbox is .
Sam
0 Kudos
Clรฉment_Stenac
Sorry, it's called "Drop existing sets, recompute new ones" - in the "Training models" modal that appears when you clikc on "Train"
0 Kudos
Sam648
Level 2
I followed your steps and clicked "Drop existing sets, recompute new ones" but it still gives the same result where the train set has 3752 rows and test has 896 rows..
0 Kudos
Sam648
Level 2
How do we generate diagnostic reports for the model we trained in DSS labs. I could not find any option in the page where it generates diagnostic reports for our model. Can you help me with this

Sam
0 Kudos