Discover all of the brand-new features and improvements to existing capabilities in the Dataiku 11.3 updateLET'S GO

Implementing grid search in a Custom Model. Is CV doubled?

Antal
Level 3
Implementing grid search in a Custom Model. Is CV doubled?

Hi,

 

I'm implementing grid search in a Custom Model in VisualML, which seems to work. But I'm wondering about something.

Judging by the training time, the grid search cv is indeed performed.

Will the cv I implemented in the custom model code be in addition to "normal" cv (from the Hyperparameters tab) or will that be disabled in this case? Otherwise I'd have cv within the cv folds already made by VisualML itself.

My code looks something like this

from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import GridSearchCV

model = AdaBoostClassifier()

params = {
    "n_estimators": [10, 20, 50],
    "learning_rate": [.1, .5, 1]
}

clf = GridSearchCV(estimator=model, param_grid=params, refit=True, cv=5)

 

Also, is there any way to communicate the chosen hyperparameters and maybe a variable importance back to VisualML from a Custom Model? Even if it's only visible in the logs.


Operating system used: Windows

0 Kudos
1 Reply
Antal
Level 3
Author

Hmm, I seem to have found some clues in the log after model training.

This part:

 

[2022-12-15 09:11:20,931] [22053/MainThread] [INFO] [dataiku.doctor.prediction.common] Using stratified K-Fold CV with k=5
[2022-12-15 09:11:20,937] [22053/MainThread] [INFO] [dataiku.doctor.crossval.search_evaluation_monitor] No distributed container configuration is available to run this search
[2022-12-15 09:11:20,939] [22053/MainThread] [INFO] [dataiku.doctor.crossval.search_runner] Got single-point space, not performing hyperparameter search
[2022-12-15 09:11:20,939] [22053/MainThread] [INFO] [dataiku.doctor.crossval.strategies.grid_search_strategy] Running GridSearchStrategy for hyperparameters space: <dataiku.doctor.prediction.common.GridHyperparametersSpace object at 0x7f1302169e90>

 

Seems to indicate that the normal k-fold cv isn't used, because there's no hyperparameters defined in the UI.

 

Too bad I can't see the final parameters chosen. I guess I could save the model to the flow and manipulate the object with the python public API to look inside the clf.

Or is there an easier way to print these out somehow?

0 Kudos

Labels

?

Setup info

?
A banner prompting to get Dataiku