when training a model with a visual recipe, does dataiku fit the model on the entire dataset?

Solved!
tanguy
when training a model with a visual recipe, does dataiku fit the model on the entire dataset?

Context:

  1. I have deployed a model to the flow
  2. I want to retrain that model with its associated "train" recipe
  3. I understand that the model's performance is evaluated using a test set or K-folds under a cross-validation strategy

My question: after retraining the model using the "train" recipe, is the resulting new active model fit on the entire dataset (as best practice sometimes suggests to do so)?

I can't find any information on this final fitting strategy in the recipe (see screenshot below) and failed to find such information in dataiku's documentation.

model_train_settings.jpg

 


Operating system used: WIndows 10

 

1 Solution
tanguy
Author

So I have checked using the evaluate recipe by checking the metrics on both the train set and the test set: the resulting model built from the train recipe is indeed trained only on the train set (and not on the entire dataset).

I have tried forcing dataiku to train the model on the entire dataset, but there is no option to do so. The only workaround I have found was to build a fake test set with just two samples (1 sample with a positive target and 1 sample with a negative target because dataiku raises an error if it does not have an observation for every target in a classification task).

A feature allowing to train a model on an entire dataset (without necessarily trying to evaluate that model) would be highly appreciated.

View solution in original post

1 Reply
tanguy
Author

So I have checked using the evaluate recipe by checking the metrics on both the train set and the test set: the resulting model built from the train recipe is indeed trained only on the train set (and not on the entire dataset).

I have tried forcing dataiku to train the model on the entire dataset, but there is no option to do so. The only workaround I have found was to build a fake test set with just two samples (1 sample with a positive target and 1 sample with a negative target because dataiku raises an error if it does not have an observation for every target in a classification task).

A feature allowing to train a model on an entire dataset (without necessarily trying to evaluate that model) would be highly appreciated.

Labels

?
Labels (1)

Setup info

?
A banner prompting to get Dataiku