Error while redeploying model from a notebook

Mohammed · March 2024

I am trying to build an adjusting model in Dataiku. I want to leverage the APIs to do the same.
I already have a model deployed in the flow. After each data refresh, I want to check the model's performance, and if it is below a threshold, I want to retrain the model. Below is the code I am using

if trained_model_MAPE > ERROR_THRESHOLD:
    # Wait for the ML task to be ready
    mltask.wait_guess_complete()
    # Obtain settings, enable GBT, and save settings
    settings = mltask.get_settings()
    settings.set_algorithm_enabled("LEASTSQUARE_REGRESSION", True)
    # Iterate over all features in the dataset and set their use/rejection
#     settings.foreach_feature(handle_feature)
 
    features_to_reject = []
    def handle_feature(feature_name, feature_params):
        if feature_name not in current_features and feature_params["role"] == 'INPUT':
            features_to_reject.append(feature_name)
        return feature_params
 
    settings.foreach_feature(handle_feature)
    for feature_name in current_features:
        settings.use_feature(feature_name)
    for feature_name in features_to_reject:
        settings.reject_feature(feature_name)
 
 
    settings.save()
    mltask.start_train()
    mltask.wait_train_complete()
    # Get the identifiers of the trained models
    ids = mltask.get_trained_models_ids()
    mape_list = []
    for id in ids:
        details = mltask.get_trained_model_details(id)
        algorithm = details.get_modeling_settings()["algorithm"]
        mape = details.get_performance_metrics()["mape"]
        print(f"Algorithm={algorithm} MAPE={mape}")
        mape_list.append(mape)

#Select the best model 
best_model_index = pd.Series(mape_list).idxmin() 
# Deploy the best model
model_to_deploy = ids[best_model_index]

I have the following questions
1) How do I use a different dataset as the test data (Explicit extract from two dataset policy in design tab of lab) ? Currently, I am setting up the mltask as given below.
I want to use MasterData as my training data and another dataset, "UpcomingData", as my test data.

mltask = p.create_prediction_ml_task(
    input_dataset="MasterData",
    target_variable="DV",
    ml_backend_type='PY_MEMORY', # ML backend to use
    guess_policy='DEFAULT' # Template to use for setting default parameters
)

2) I want to use just two algorithms, GBT_REGRESSION and LEASTSQUARE_REGRESSION. Currently, there are some default algorithms as well. How do I restrict the use of algorithms?

3) I am getting the following error while redeploying the model in the flow. Why am I getting this error? I want to replace the model I built previously before the data refresh with a new model. Is it possible in Dataiku?

ret = mltask.redeploy_to_flow(model_to_deploy,saved_model_id="EXISTING_MODEL_ID")

DataikuException: com.dataiku.dip.exceptions.APIIllegalArgumentException: Saved model is not built by a recipe of this project

Operating system used: Windows

AdrienL · March 2024

You would use set_split_explicit on the mltask settings' split params obtained via get_split_params
You could call disable_all_algorithms before enabling the ones you want.
This depends a bit on the specifics of your project and may need some of your' project's configuration to troubleshoot. I suggest getting in touch with Dataiku's customer support.

Mohammed · April 2024

@AdrienL
Thanks for the reply.
Can you elaborate on the first?
I'm still not able to use two different datasets, one as a train and one as a test. I tried as given below.

settings.get_split_params().set_split_explicit(test_dataset_name="UpcomingData_Toronto")

it is giving an error below
TypeError: set_split_explicit() missing 2 required positional arguments: 'train_selection' and 'test_selection'

Error while redeploying model from a notebook

Answers

Categories

Setup Info

Tags