How to train a model on a partitioned dataset using API

Mohammed · May 17

How do I build a model on a partitioned dataset in Dataiku using the API?
I'm using the below code to develop the model. How do I modify this code to build a partitioned model?
"trainset" is partitioned data in the column "Market".

# client is a DSS API client

p = client.get_project("MYPROJECT")

# Create a new ML Task to predict the variable "target" from "trainset"
mltask = p.create_prediction_ml_task(
    input_dataset="trainset",
    target_variable="target",
    ml_backend_type='PY_MEMORY', # ML backend to use
    guess_policy='DEFAULT' # Template to use for setting default parameters
)

# Wait for the ML task to be ready
mltask.wait_guess_complete()

# Obtain settings, enable GBT, save settings
settings = mltask.get_settings()
settings.set_algorithm_enabled("GBT_CLASSIFICATION", True)
settings.save()

# Start train and wait for it to be complete
mltask.start_train()
mltask.wait_train_complete()

# Get the identifiers of the trained models
# There will be 3 of them because Logistic regression and Random forest were default enabled
ids = mltask.get_trained_models_ids()

for id in ids:
    details = mltask.get_trained_model_details(id)
    algorithm = details.get_modeling_settings()["algorithm"]
    auc = details.get_performance_metrics()["auc"]

    print("Algorithm=%s AUC=%s" % (algorithm, auc))

# Let's deploy the first model
model_to_deploy = ids[0]

ret = mltask.deploy_to_flow(model_to_deploy, "my_model", "trainset")

print("Deployed to saved model id = %s train recipe = %s" % (ret["savedModelId"], ret["trainRecipeName"]))

I appreciate any help you can provide.

Operating system used: Windows

AdrienL · May 17

Hi,

You can try the following before saving the mltask's settings:

settings.get_raw()['partitionedModel']['enabled'] = True
settings.save()

How to train a model on a partitioned dataset using API

Best Answer

Categories

Setup Info

Tags