Test / Train Split

al-gharak
al-gharak Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 3 ✭✭✭

I am using the community version. I have a dataset with 200k rows. I split the set and performed feature engineering work separately on train and test. The goal was to prevent data from leaking between the test and train set. Is there a way to run training and specify two separate datasets, one for train and the other for the test? I did see an option for this, but it still is asking for a split ratio. My goal is to use 100% of the train ds for training and 100% of the test ds from testing. They are in separate folders.


Operating system used: mac os

Tagged:

Best Answer

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
    Answer ✓

    Hi,

    You should be able to use the option "Explicit extracts from two datasets" to achieve what you are looking for.

    You can change this from Model - Design - Train / test set

    Screenshot 2022-08-22 at 18.09.40.png

    Let me know if that helps!

Answers

  • al-gharak
    al-gharak Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 3 ✭✭✭

    Thanks for your help. On a separate note, is there a way to handle Train/Test split depicted in the attached photo?

    attached picture?IMG_1333.jpg

  • Rickh008
    Rickh008 Dataiku DSS Core Designer, Registered Posts: 15 ✭✭✭✭

    Are you asking for cross-validation (CV)? 5-fold CV is "on" by default and may be edited or changed from the Hyperparameters section under MODELING.

    Capture.PNG

Setup Info
    Tags
      Help me…