Hello Dataiku team and community,
I've recently finished my reading of the MLE book from Andriy Burkov and also won my ML Practitioner Certificate from Dataiku Academy (yay, really excited!!). I found a huge intersection between the book and the capabilities available on DSS (that's amazing!!) and there is a single topic that I would like to bring here and see what DSS offers (and I might have missed) and/or the community has practiced on their projects.
The author mentions four Deployment Strategies, which I describe in my words here:
Single deployment – a current model in production is just replaced by a new model. Simplest approach but also implying big risks in case the new model wasn’t sufficiently validated prior to deployment. Negative impacts may immediately be felt by all consumers.
Silent deployment – a new model is deployed but doesn’t replace the current one. Instead, they run in parallel (100% of requests to both) and every prediction from the new model is only logged. The logged information can be further checked until there is enough confidence to effectively replace the current model. Very low risk of negatively impacting consumers, though it also requires manual checks and is prone to human error.
Canary Deployment – the new model is deployed and starts to immediately serve a predefined portion of consumers (e.g., 10% of requests served by the new model, and 90% still being routed to the previous one). Negative impacts may immediately be felt by 10% of the consumers. Even if this is not critical, verifications here still require manual checks and are prone to human error.
Multi-Armed Bandits – MAB works similarly to canary deployment, but the portion of requests is dynamically and iteratively handled by the algorithm, always converging at the end to 100% of requests going to the best model (the new one or the previous one). This is the most sophisticated approach and seems to be less prone to human error.
And here is the intersection that I found (or haven't found):
|Deployment Strategy||Dataiku DSS||Conclusion|
|Single deployment||Naturally supported with the "Update" option of API deployment.||Supported|
|Silent deployment||Partially supported by the deployment of parallel versions of an API, but still requiring all the rest of the work (the decision of which API to just log and which to effectively serve) from other software layers (consumers).||Not natively supported|
|Canary deployment||Naturally supported in the general settings of the deployments with the Active version mode -> Multiple Generations option.||Supported|
|Multi-armed bandits||Same as Single Deployment, it apparently should be mostly handled outside DSS and software layers.||Not natively supported|
So, my intention is to bring it here, double-check my understanding with the community, and see what DSS offers (and I might have missed) and/or the community has practiced on their projects!
Thank you all
Hi Daniel, thanks for this interesting input!
For many custom deployment modes, it is important to underline that they need the "ground truth" data to assess the quality of the predictions performed by the models in production. It is the only to quantify the quality of the model and therefore detect what you call "negative impacts" (which I assume are, in their simplest forms, bad predictions).
That being said, by combining several DSS features such as the Event Server, the dispatcher endpoint pattern and all the public API endpoints relative to the Deployer and the API Deployments, you should have enough flexibility to implement something that comes close to what you describe for "silent deployments" and "multi-armed bandits". In fact, since there is a lot of diversity among DSS users on how they deploy ML models, instead of directly packaging pre-built deployment modes, DSS makes the choice of flexibility and allows the users to easily craft their own tailor-made solution.
Thanks @HarizoR ! Indeed, for Silent deployment, a custom plain Python endpoint could easily call two individual prediction endpoints and serve only one of them, keeping the other one only for logging (and comparison) purposes. The Event Server would store everything about all requests. We do use it in general but I haven't thought about using it for Silent deployment. Thanks for helping me figure this out!
About MAB, it's more complex and I don't even have a deep understanding of the algorithm itself, but an essential point would definitely be a clear ground truth for calculating any desired metric. Thanks for that point too!
I will still keep this discussion open for a while and see if anybody else in the community brings other valuable insights!
Thank you again!