Beginner Machine Learning query

sidstack
sidstack Registered Posts: 6 ✭✭✭✭

Hello,

I will be working on my first supervised Machine Learning problem: where I be will provided a

1. training dataset (train) and a prediction dataset (test)

2. several files for you to join and build the train and the test datasets.

In DSS after I finish training model on train set how can I use it to predict on test set for the above two scenarios?

Thanks

Answers

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    @sidstack
    ,

    Welcome to the Dataiku Community.

    Built into the visual model building is the opportunity to automatically have train/test split either by percentage or N fold cross-validation. Once a model is deployed to the flow you can then use the evaluate and predict visual recipes. To evaluate your model against a validation data set you would add an evaluation recipe. And then deploy visual recipe to make predictions against new data.

    I'd like to invite you to take a look at some of the Dataiku Academy learning modules on visual machine learning. In particular, to answer your questions, I'd look at Machine Learning Basics and Scoring Basics. I found both of these useful.

    I hope that this is of help too you.

  • sidstack
    sidstack Registered Posts: 6 ✭✭✭✭

    Hello, thank you so much for your reply. I have actually done those courses that you mentioned, but what I was unclear on was how to score unseen data with the train set scores.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    @sidstack

    I'm not exactly clear what you mean by "score unseen data with the train set scores".

    From my point of view for classification and regression problems training data is seen, in fact it contains a ground truth target variable.

    Or are you thinking about the process of updating your model over time? As you get new data wanting to make your model better by retraining?

    If you are talking about the ongoing process of learning from data as it comes in for future predictions, I believe that that is handled by Scenarios (a feature not available in the free version.)

    That is what I have for now unless you can explain just a little bit more about what you are trying to do. Others please feel free to jump in here and help.

    --Tom

Setup Info
    Tags
      Help me…