Discover the winners & finalists of the 2022 Dataiku Frontrunner Awards!READ THEIR USE CASES

Test / Train Split

Solved!
al-gharak
Level 1
Test / Train Split

I am using the community version.  I have a dataset with 200k rows.  I split the set and performed feature engineering work separately on train and test.  The goal was to prevent data from leaking between the test and train set.  Is there a way to run training and specify two separate datasets, one for train and the other for the test?  I did see an option for this, but it still is asking for a split ratio.  My goal is to use 100% of the train ds for training and 100% of the test ds from testing.  They are in separate folders.


Operating system used: mac os

0 Kudos
1 Solution
AlexT
Dataiker

Hi,

You should be able to use the option "Explicit extracts from two datasets" to achieve what you are looking for.

You can change this from Model - Design - Train / test set 

Screenshot 2022-08-22 at 18.09.40.png

Let me know if that helps!

View solution in original post

0 Kudos
3 Replies
AlexT
Dataiker

Hi,

You should be able to use the option "Explicit extracts from two datasets" to achieve what you are looking for.

You can change this from Model - Design - Train / test set 

Screenshot 2022-08-22 at 18.09.40.png

Let me know if that helps!

0 Kudos
al-gharak
Level 1
Author

  Thanks for your help.  On a separate note, is there a way to handle Train/Test split depicted in the attached photo?

attached picture?IMG_1333.jpg

0 Kudos
Rickh008
Level 3

Are you asking for cross-validation (CV)? 5-fold CV is "on" by default and may be edited or changed from the Hyperparameters section under MODELING.

 

Capture.PNG

0 Kudos