How to train a model on a partitioned dataset using API
Mohammed
Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 43 ✭✭✭
How do I build a model on a partitioned dataset in Dataiku using the API?
I'm using the below code to develop the model. How do I modify this code to build a partitioned model?
"trainset" is partitioned data in the column "Market".
# client is a DSS API client p = client.get_project("MYPROJECT") # Create a new ML Task to predict the variable "target" from "trainset" mltask = p.create_prediction_ml_task( input_dataset="trainset", target_variable="target", ml_backend_type='PY_MEMORY', # ML backend to use guess_policy='DEFAULT' # Template to use for setting default parameters ) # Wait for the ML task to be ready mltask.wait_guess_complete() # Obtain settings, enable GBT, save settings settings = mltask.get_settings() settings.set_algorithm_enabled("GBT_CLASSIFICATION", True) settings.save() # Start train and wait for it to be complete mltask.start_train() mltask.wait_train_complete() # Get the identifiers of the trained models # There will be 3 of them because Logistic regression and Random forest were default enabled ids = mltask.get_trained_models_ids() for id in ids: details = mltask.get_trained_model_details(id) algorithm = details.get_modeling_settings()["algorithm"] auc = details.get_performance_metrics()["auc"] print("Algorithm=%s AUC=%s" % (algorithm, auc)) # Let's deploy the first model model_to_deploy = ids[0] ret = mltask.deploy_to_flow(model_to_deploy, "my_model", "trainset") print("Deployed to saved model id = %s train recipe = %s" % (ret["savedModelId"], ret["trainRecipeName"]))
I appreciate any help you can provide.
Operating system used: Windows
Best Answer
-
Hi,
You can try the following before saving the mltask's settings:
settings.get_raw()['partitionedModel']['enabled'] = True settings.save()