We're excited to announce that we're launching the second installment of Dataiku Product Days Register Now

Exploring CI/CD in a Machine Learning Project With Dataiku

CoreyS
Community Manager
Community Manager
Exploring CI/CD in a Machine Learning Project With Dataiku

Originally posted by Dataiku on July 6, 2021.

@fsergot, Senior Product Manager at Dataiku, shared some insight into CI/CD features in Dataiku during the 2021 Dataiku Product Days. This blog post will highlight parts of his session by going over some of the basics of CI/CD and presenting a demo led by François in Dataiku.

→ Watch the Full Session

What Is CI/CD in a Machine Learning Project?

It all starts with the notion of Operationalization. This defines how you are going to serve your machine learning (ML) project to your business user. Technically, you can operationalize your projects using methods than range from fully manual to fully automated. This topic is focusing on the fully automated approach. However interesting this ‘full automation’ approach seems, remember that it is not wise to set a goal of everything automated: this would be unreasonable. Instead, you need to evaluate each project according to its criticality and the resources that you have to determine whether it should be automated or not.

Full automation in this context means CI/CD. CI/CD refers to the combined practices of continuous integration and continuous deployment. The Continuous Integration part means merging a shared work into a shippable product as often as possible.And the Continuous Deployment part means deploying this shippable product as often as possible. And both through an Automated process.
You can have a more complete understanding around CI/CD from our previous blog post.

Machine learning projects can benefit from CI/CD at many levels, we can highlight:, there are some specificities to consider.

  • Models decay and need to be renewed: Machine learning inherently deals with models that decay over time and need to be retrained and monitored. To do this, you need this notion of frequent updates.
  • Complexity of dependencies for model deployment: Models heavily depend on data preparation, infrastructures, and the data that you're manipulating. So it makes moving models to production a complex operation which could greatly benefit from automation.

 

CI/CD in Action

In this example, we will be using the churn Prediction project and see how it can be push in production using a fully automated Jenkins pipeline

In this example, François walks us through a project that was on his design node and follows these steps:

  1. Validating and packaging the project
  2. Pushing the project to the test Automation node
  3. Running Tests on the project
  4. Moving the project to the production Automation node
  5. Running a smoke test and rollbacking if necessary

We will also see some additional thoughts and ideas to help you start such a project.

The step-by-step explanation with code samples used in this video can be accessed here.

Watch the Full Video

Dive deeper into this session, which is notable to understand where the human fits into project validation, packaging, testing, pre-production, and moving to production.

WATCH NOW

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
0 Kudos
0 Replies
A banner prompting to get Dataiku DSS