Test / Train Split

Tags
Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 3 ✭✭✭

I am using the community version. I have a dataset with 200k rows. I split the set and performed feature engineering work separately on train and test. The goal was to prevent data from leaking between the test and train set. Is there a way to run training and specify two separate datasets, one for train and the other for the test? I did see an option for this, but it still is asking for a split ratio. My goal is to use 100% of the train ds for training and 100% of the test ds from testing. They are in separate folders.


Operating system used: mac os

Best Answer

  • Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,270 Dataiker
    Answer ✓

    Hi,

    You should be able to use the option "Explicit extracts from two datasets" to achieve what you are looking for.

    You can change this from Model - Design - Train / test set

    Screenshot 2022-08-22 at 18.09.40.png

    Let me know if that helps!

Answers

  • Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 3 ✭✭✭

    Thanks for your help. On a separate note, is there a way to handle Train/Test split depicted in the attached photo?

    attached picture?IMG_1333.jpg

  • Dataiku DSS Core Designer, Registered Posts: 15 ✭✭✭✭

    Are you asking for cross-validation (CV)? 5-fold CV is "on" by default and may be edited or changed from the Hyperparameters section under MODELING.

    Capture.PNG

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.