Test / Train Split
I am using the community version. I have a dataset with 200k rows. I split the set and performed feature engineering work separately on train and test. The goal was to prevent data from leaking between the test and train set. Is there a way to run training and specify two separate datasets, one for train and the other for the test? I did see an option for this, but it still is asking for a split ratio. My goal is to use 100% of the train ds for training and 100% of the test ds from testing. They are in separate folders.
Operating system used: mac os
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi,
You should be able to use the option "Explicit extracts from two datasets" to achieve what you are looking for.
You can change this from Model - Design - Train / test set
Let me know if that helps!
Answers
-
Thanks for your help. On a separate note, is there a way to handle Train/Test split depicted in the attached photo?
attached picture?
-
Are you asking for cross-validation (CV)? 5-fold CV is "on" by default and may be edited or changed from the Hyperparameters section under MODELING.