Implementing grid search in a Custom Model. Is CV doubled?

Antal
Antal Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 88 Neuron
edited July 16 in Using Dataiku

Hi,

I'm implementing grid search in a Custom Model in VisualML, which seems to work. But I'm wondering about something.

Judging by the training time, the grid search cv is indeed performed.

Will the cv I implemented in the custom model code be in addition to "normal" cv (from the Hyperparameters tab) or will that be disabled in this case? Otherwise I'd have cv within the cv folds already made by VisualML itself.

My code looks something like this

from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import GridSearchCV

model = AdaBoostClassifier()

params = {
    "n_estimators": [10, 20, 50],
    "learning_rate": [.1, .5, 1]
}

clf = GridSearchCV(estimator=model, param_grid=params, refit=True, cv=5)

Also, is there any way to communicate the chosen hyperparameters and maybe a variable importance back to VisualML from a Custom Model? Even if it's only visible in the logs.


Operating system used: Windows

Answers

  • Antal
    Antal Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 88 Neuron
    edited July 17

    Hmm, I seem to have found some clues in the log after model training.

    This part:

    [2022-12-15 09:11:20,931] [22053/MainThread] [INFO] [dataiku.doctor.prediction.common] Using stratified K-Fold CV with k=5
    [2022-12-15 09:11:20,937] [22053/MainThread] [INFO] [dataiku.doctor.crossval.search_evaluation_monitor] No distributed container configuration is available to run this search
    [2022-12-15 09:11:20,939] [22053/MainThread] [INFO] [dataiku.doctor.crossval.search_runner] Got single-point space, not performing hyperparameter search
    [2022-12-15 09:11:20,939] [22053/MainThread] [INFO] [dataiku.doctor.crossval.strategies.grid_search_strategy] Running GridSearchStrategy for hyperparameters space: <dataiku.doctor.prediction.common.GridHyperparametersSpace object at 0x7f1302169e90>

    Seems to indicate that the normal k-fold cv isn't used, because there's no hyperparameters defined in the UI.

    Too bad I can't see the final parameters chosen. I guess I could save the model to the flow and manipulate the object with the python public API to look inside the clf.

    Or is there an easier way to print these out somehow?

Setup Info
    Tags
      Help me…