Retrain only the best model from a visual analysis
I'm using a visual analysis to build some machine learning algorithms (Random Forest and XGBoost) with a grid search consisting of 20 iterations for each algorithm. It ends up that XGBoost performs better and therefore I deploy it into the flow. So far so good.
If I want to retrain the model by clicking on the "Retrain" button on the diamond-shape saved model, DSS takes a lot of time to retrain it, and that seems strange for a single model. By having a look at the logs and the new model version, it seems that DSS is performing again a 20-iteration grid search on the XGBoost algorithm, whereas I would like it to retrain only the model with the best hyper-parameters identified in the visual analysis (they are also available in the Model Information > Algorithm menu).
How can I do that in a proper way? It comes to mind to run, inside the visual analysis, a new session selecting only the XGBoost algorithm and insert manually the best set of hyper-parameters, but maybe there's a more efficient way to do it.
Best Answer
-
Hello,
when you deploy a model from the Lab to the flow, there is an advanced setting allowing you to only re-train with the best set of hyperparameters.
Go back to the Lab, click on your model > Deploy > Select whether you want to deploy it to an existing saved model or create a new one > Click on "Advanced" on the bottom left of the modal
Then, for Model parameters select Use already detected parameters
Then, click on create. This should create a train recipe with the behaviour you are looking for.
Hope this helps,
Best regards
Answers
-
RicSpd Partner, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Registered Posts: 8 Partner
This is exactly what I was looking for! Too bad I didn't spot the "Advanced" button. Thanks a lot @Nicolas_Servel