Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on December 12, 2022 4:51PM
Likes: 1
Replies: 2
Context:
My question: after retraining the model using the "train" recipe, is the resulting new active model fit on the entire dataset (as best practice sometimes suggests to do so)?
I can't find any information on this final fitting strategy in the recipe (see screenshot below) and failed to find such information in dataiku's documentation.
Operating system used: WIndows 10
So I have checked using the evaluate recipe by checking the metrics on both the train set and the test set: the resulting model built from the train recipe is indeed trained only on the train set (and not on the entire dataset).
I have tried forcing dataiku to train the model on the entire dataset, but there is no option to do so. The only workaround I have found was to build a fake test set with just two samples (1 sample with a positive target and 1 sample with a negative target because dataiku raises an error if it does not have an observation for every target in a classification task).
A feature allowing to train a model on an entire dataset (without necessarily trying to evaluate that model) would be highly appreciated.
Just FYI, with the latest version (Version 12), we can choose the "Train on 100% and split for performance" setting in the train recipe. Then, we can use all the training data for the training.