How to train a model on a partitioned dataset using API

Mohammed Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 39 ✭✭✭
edited July 16 in Using Dataiku

How do I build a model on a partitioned dataset in Dataiku using the API?
I'm using the below code to develop the model. How do I modify this code to build a partitioned model?
"trainset" is partitioned data in the column "Market".

# client is a DSS API client

p = client.get_project("MYPROJECT")

# Create a new ML Task to predict the variable "target" from "trainset"
mltask = p.create_prediction_ml_task(
    ml_backend_type='PY_MEMORY', # ML backend to use
    guess_policy='DEFAULT' # Template to use for setting default parameters

# Wait for the ML task to be ready

# Obtain settings, enable GBT, save settings
settings = mltask.get_settings()
settings.set_algorithm_enabled("GBT_CLASSIFICATION", True)

# Start train and wait for it to be complete

# Get the identifiers of the trained models
# There will be 3 of them because Logistic regression and Random forest were default enabled
ids = mltask.get_trained_models_ids()

for id in ids:
    details = mltask.get_trained_model_details(id)
    algorithm = details.get_modeling_settings()["algorithm"]
    auc = details.get_performance_metrics()["auc"]

    print("Algorithm=%s AUC=%s" % (algorithm, auc))

# Let's deploy the first model
model_to_deploy = ids[0]

ret = mltask.deploy_to_flow(model_to_deploy, "my_model", "trainset")

print("Deployed to saved model id = %s train recipe = %s" % (ret["savedModelId"], ret["trainRecipeName"]))

I appreciate any help you can provide.

Operating system used: Windows

Best Answer

  • AdrienL
    AdrienL Dataiker, Alpha Tester Posts: 196 Dataiker
    edited July 17 Answer ✓


    You can try the following before saving the mltask's settings:

    settings.get_raw()['partitionedModel']['enabled'] = True

Setup Info
      Help me…