I will be working on my first supervised Machine Learning problem: where I be will provided a
1. training dataset (train) and a prediction dataset (test)
2. several files for you to join and build the train and the test datasets.
In DSS after I finish training model on train set how can I use it to predict on test set for the above two scenarios?
Welcome to the Dataiku Community.
Built into the visual model building is the opportunity to automatically have train/test split either by percentage or N fold cross-validation. Once a model is deployed to the flow you can then use the evaluate and predict visual recipes. To evaluate your model against a validation data set you would add an evaluation recipe. And then deploy visual recipe to make predictions against new data.
I'd like to invite you to take a look at some of the Dataiku Academy learning modules on visual machine learning. In particular, to answer your questions, I'd look at Machine Learning Basics and Scoring Basics. I found both of these useful.
I hope that this is of help too you.
Hello, thank you so much for your reply. I have actually done those courses that you mentioned, but what I was unclear on was how to score unseen data with the train set scores.
I'm not exactly clear what you mean by "score unseen data with the train set scores".
From my point of view for classification and regression problems training data is seen, in fact it contains a ground truth target variable.
Or are you thinking about the process of updating your model over time? As you get new data wanting to make your model better by retraining?
If you are talking about the ongoing process of learning from data as it comes in for future predictions, I believe that that is handled by Scenarios (a feature not available in the free version.)
That is what I have for now unless you can explain just a little bit more about what you are trying to do. Others please feel free to jump in here and help.