when training a model with a visual recipe, does dataiku fit the model on the entire dataset?
Context:
- I have deployed a model to the flow
- I want to retrain that model with its associated "train" recipe
- I understand that the model's performance is evaluated using a test set or K-folds under a cross-validation strategy
My question: after retraining the model using the "train" recipe, is the resulting new active model fit on the entire dataset (as best practice sometimes suggests to do so)?
I can't find any information on this final fitting strategy in the recipe (see screenshot below) and failed to find such information in dataiku's documentation.
Operating system used: WIndows 10
Best Answers
-
Tanguy Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2023 Posts: 118 Neuron
So I have checked using the evaluate recipe by checking the metrics on both the train set and the test set: the resulting model built from the train recipe is indeed trained only on the train set (and not on the entire dataset).
I have tried forcing dataiku to train the model on the entire dataset, but there is no option to do so. The only workaround I have found was to build a fake test set with just two samples (1 sample with a positive target and 1 sample with a negative target because dataiku raises an error if it does not have an observation for every target in a classification task).
A feature allowing to train a model on an entire dataset (without necessarily trying to evaluate that model) would be highly appreciated.
-
Tsuyoshi Dataiker, PartnerAdmin, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 137 Dataiker
Just FYI, with the latest version (Version 12), we can choose the "Train on 100% and split for performance" setting in the train recipe. Then, we can use all the training data for the training.