Implementing grid search in a Custom Model. Is CV doubled?
Hi,
I'm implementing grid search in a Custom Model in VisualML, which seems to work. But I'm wondering about something.
Judging by the training time, the grid search cv is indeed performed.
Will the cv I implemented in the custom model code be in addition to "normal" cv (from the Hyperparameters tab) or will that be disabled in this case? Otherwise I'd have cv within the cv folds already made by VisualML itself.
My code looks something like this
from sklearn.ensemble import AdaBoostClassifier from sklearn.model_selection import GridSearchCV model = AdaBoostClassifier() params = { "n_estimators": [10, 20, 50], "learning_rate": [.1, .5, 1] } clf = GridSearchCV(estimator=model, param_grid=params, refit=True, cv=5)
Also, is there any way to communicate the chosen hyperparameters and maybe a variable importance back to VisualML from a Custom Model? Even if it's only visible in the logs.
Operating system used: Windows
Answers
-
Antal Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 91 Neuron
Hmm, I seem to have found some clues in the log after model training.
This part:
[2022-12-15 09:11:20,931] [22053/MainThread] [INFO] [dataiku.doctor.prediction.common] Using stratified K-Fold CV with k=5 [2022-12-15 09:11:20,937] [22053/MainThread] [INFO] [dataiku.doctor.crossval.search_evaluation_monitor] No distributed container configuration is available to run this search [2022-12-15 09:11:20,939] [22053/MainThread] [INFO] [dataiku.doctor.crossval.search_runner] Got single-point space, not performing hyperparameter search [2022-12-15 09:11:20,939] [22053/MainThread] [INFO] [dataiku.doctor.crossval.strategies.grid_search_strategy] Running GridSearchStrategy for hyperparameters space: <dataiku.doctor.prediction.common.GridHyperparametersSpace object at 0x7f1302169e90>
Seems to indicate that the normal k-fold cv isn't used, because there's no hyperparameters defined in the UI.
Too bad I can't see the final parameters chosen. I guess I could save the model to the flow and manipulate the object with the python public API to look inside the clf.
Or is there an easier way to print these out somehow?