Want to Stop Rebuilding "Expensive" Parts of your Flow? Explicit Builds are the Answer!READ MORE

Test / Train Split

Solved!
al-gharak
Level 1
Test / Train Split

I am using the community version.  I have a dataset with 200k rows.  I split the set and performed feature engineering work separately on train and test.  The goal was to prevent data from leaking between the test and train set.  Is there a way to run training and specify two separate datasets, one for train and the other for the test?  I did see an option for this, but it still is asking for a split ratio.  My goal is to use 100% of the train ds for training and 100% of the test ds from testing.  They are in separate folders.


Operating system used: mac os

0 Kudos
1 Solution
AlexT
Dataiker
Dataiker

Hi,

You should be able to use the option "Explicit extracts from two datasets" to achieve what you are looking for.

You can change this from Model - Design - Train / test set 

Screenshot 2022-08-22 at 18.09.40.png

Let me know if that helps!

View solution in original post

0 Kudos
3 Replies
AlexT
Dataiker
Dataiker

Hi,

You should be able to use the option "Explicit extracts from two datasets" to achieve what you are looking for.

You can change this from Model - Design - Train / test set 

Screenshot 2022-08-22 at 18.09.40.png

Let me know if that helps!

0 Kudos
al-gharak
Level 1
Author

  Thanks for your help.  On a separate note, is there a way to handle Train/Test split depicted in the attached photo?

attached picture?IMG_1333.jpg

0 Kudos
Rickh008
Level 2

Are you asking for cross-validation (CV)? 5-fold CV is "on" by default and may be edited or changed from the Hyperparameters section under MODELING.

 

Capture.PNG

0 Kudos