How to train a model on a partitioned dataset using API
Mohammed
Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 44 ✭✭✭
How do I build a model on a partitioned dataset in Dataiku using the API?
I'm using the below code to develop the model. How do I modify this code to build a partitioned model?
"trainset" is partitioned data in the column "Market".
# client is a DSS API client
p = client.get_project("MYPROJECT")
# Create a new ML Task to predict the variable "target" from "trainset"
mltask = p.create_prediction_ml_task(
input_dataset="trainset",
target_variable="target",
ml_backend_type='PY_MEMORY', # ML backend to use
guess_policy='DEFAULT' # Template to use for setting default parameters
)
# Wait for the ML task to be ready
mltask.wait_guess_complete()
# Obtain settings, enable GBT, save settings
settings = mltask.get_settings()
settings.set_algorithm_enabled("GBT_CLASSIFICATION", True)
settings.save()
# Start train and wait for it to be complete
mltask.start_train()
mltask.wait_train_complete()
# Get the identifiers of the trained models
# There will be 3 of them because Logistic regression and Random forest were default enabled
ids = mltask.get_trained_models_ids()
for id in ids:
details = mltask.get_trained_model_details(id)
algorithm = details.get_modeling_settings()["algorithm"]
auc = details.get_performance_metrics()["auc"]
print("Algorithm=%s AUC=%s" % (algorithm, auc))
# Let's deploy the first model
model_to_deploy = ids[0]
ret = mltask.deploy_to_flow(model_to_deploy, "my_model", "trainset")
print("Deployed to saved model id = %s train recipe = %s" % (ret["savedModelId"], ret["trainRecipeName"]))
I appreciate any help you can provide.
Operating system used: Windows
Best Answer
-
Hi,
You can try the following before saving the mltask's settings:
settings.get_raw()['partitionedModel']['enabled'] = True settings.save()