when training a model with a visual recipe, does dataiku fit the model on the entire dataset?

Tanguy · December 2022

Context:

I have deployed a model to the flow
I want to retrain that model with its associated "train" recipe
I understand that the model's performance is evaluated using a test set or K-folds under a cross-validation strategy

My question: after retraining the model using the "train" recipe, is the resulting new active model fit on the entire dataset (as best practice sometimes suggests to do so)?

I can't find any information on this final fitting strategy in the recipe (see screenshot below) and failed to find such information in dataiku's documentation.

Operating system used: WIndows 10

Tanguy · December 2022

So I have checked using the evaluate recipe by checking the metrics on both the train set and the test set: the resulting model built from the train recipe is indeed trained only on the train set (and not on the entire dataset).

I have tried forcing dataiku to train the model on the entire dataset, but there is no option to do so. The only workaround I have found was to build a fake test set with just two samples (1 sample with a positive target and 1 sample with a negative target because dataiku raises an error if it does not have an observation for every target in a classification task).

A feature allowing to train a model on an entire dataset (without necessarily trying to evaluate that model) would be highly appreciated.

Tsuyoshi · April 2024

Just FYI, with the latest version (Version 12), we can choose the "Train on 100% and split for performance" setting in the train recipe. Then, we can use all the training data for the training.

Monosnap train_Prediction_RANDOM_FOREST_REGRESSION - Recipe _ Dataiku 2024-04-09 16-41-35.png

when training a model with a visual recipe, does dataiku fit the model on the entire dataset?

Best Answers

Categories

Setup Info

Tags