Train set for a model has more rows than actual dataset

UserBird Dataiker, Alpha Tester Posts: 535 Dataiker


I am training a classification model on a dataset which has 3612 rows. However, when i looked at the stats after training the model , the Train Set had 3752 rows and Test set had 896 rows which is much higher than the rows in original dataset. I wanted to know what could have caused this to happen. Can you please help me find the reason for this problem

I used the default settings for model in DSS.

Thank you



  • Sam648
    Sam648 Registered Posts: 13 ✭✭✭✭

    When i followed your steps, i got 171 column counts and 3612 record counts.

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Could you:
    * Retrain your model. In the pre-train modal, make sure to check the "recompute splits" checkbox
    * If the problem persists, generate a diagnostic report (Administration > Maintenance > Diagnostic tool)
    * Send it to (If the file is above 15 MB, you can use WeTransfer or a similar service)

  • Sam648
    Sam648 Registered Posts: 13 ✭✭✭✭
    I could not find the "recompute splits" checkbox. Can you please point me where that checkbox is .
  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Sorry, it's called "Drop existing sets, recompute new ones" - in the "Training models" modal that appears when you clikc on "Train"
  • Sam648
    Sam648 Registered Posts: 13 ✭✭✭✭
    I followed your steps and clicked "Drop existing sets, recompute new ones" but it still gives the same result where the train set has 3752 rows and test has 896 rows..
  • Sam648
    Sam648 Registered Posts: 13 ✭✭✭✭
    How do we generate diagnostic reports for the model we trained in DSS labs. I could not find any option in the page where it generates diagnostic reports for our model. Can you help me with this

Setup Info
      Help me…