Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

Redeploy the same model after retraining.

Solved!
gnaldi62
Neuron
Neuron
Redeploy the same model after retraining.

Hi all,

  we have duplicated a project; in this project we have a predictive model which in the original

  project was trained with multiple algorithms, and among these, the one with the best score was

  deployed. Because in the new project a few columns from the starting training dataset have been

  deleted, we need to retrain and redeploy (otherwise the scoring fails because also the dataset

   to score has the same columns removed).

  Because in the project there are many of these, we are trying to automate everything via a script

  using Python API.

  What we have done so far is to get all the saved models and from them the original ml_task.

saved_models = this_project.list_saved_models()
for smod in saved_models:
  if smod['name'] in saved_models_names:
    current_model = smod['id']
    current_saved_model = this_project.get_saved_model(current_model)
    current_ml_task = current_saved_model.get_origin_ml_task()

  We also are OK with training the model,

  try:

    list_trained = current_ml_task.train()

....

  The question is: how can we redeploy the model in the list trained corresponding to the same

  algorithm original deployed ? We mean: if it was "Lasso (L1)" (but we don't know) how (if feasible):

  a) to retrieve the name of the algorithm originally deployed (from the model) ?

  b) how to redeploy the same algorithm of the retrained model ?

  Txs. Rgds.

Giuseppe

0 Kudos
1 Solution
arnaudde
Dataiker
Dataiker

Hello Giuseppe,

You can get the lastExportedFrom argument of the saved model which will give you the id of the trained model it comes from. From there you can get the trained model details and the name of the algorithm. Here is a sample code

saved_model = client.get_project("MYPROJECT").get_saved_model("n1EpkFGp")
trained_model_id = saved_model.get_settings().get_raw()["lastExportedFrom"]
mltask = saved_model.get_origin_ml_task()
mltask.get_trained_model_details(trained_model_id).get_raw()["modeling"]["algorithm"]


You can then redeploy the model using the deploy_to_flow method.

You can also have a look at the example in the doc 

Hope it helps,
Arnaud

View solution in original post

0 Kudos
5 Replies
arnaudde
Dataiker
Dataiker

Hello Giuseppe,

You can get the lastExportedFrom argument of the saved model which will give you the id of the trained model it comes from. From there you can get the trained model details and the name of the algorithm. Here is a sample code

saved_model = client.get_project("MYPROJECT").get_saved_model("n1EpkFGp")
trained_model_id = saved_model.get_settings().get_raw()["lastExportedFrom"]
mltask = saved_model.get_origin_ml_task()
mltask.get_trained_model_details(trained_model_id).get_raw()["modeling"]["algorithm"]


You can then redeploy the model using the deploy_to_flow method.

You can also have a look at the example in the doc 

Hope it helps,
Arnaud

0 Kudos
gnaldi62
Neuron
Neuron
Author

Hi Arnaud,

  thank you for the details. One doubt: if we use "depoly_to_flow" we get a new recipe, right ?

  But if we want to use the same original recipe (we have many steps donwward) ? Would it

  be OK to use "redeploy_to_flow" instead ?

Thanks. Regards.

  Giuseppe

0 Kudos
arnaudde
Dataiker
Dataiker

Hello Giuseppe,
Indeed you should use redeploy_to_flow not deploy_to_flow, my bad.

Best,

Arnaud

0 Kudos
gnaldi62
Neuron
Neuron
Author

Great. Many thanks. Rgds.

Giuseppe

0 Kudos
gnaldi62
Neuron
Neuron
Author

Quick update: when duplicating the project the info about  lastExportedFrom  is lost (probably a unique id so stays with original project ?).

Current workaround is to retrain and simply redeploy one of the trained models. At least the scoring recipe does not fail and the script can go on with building all the remaining datasets.

Rgds.

Giuseppe

0 Kudos