ZS - Using Dataiku to Develop Over 100 ML Models for a Life Sciences Organization

amanjainzs
amanjainzs Partner, Registered, Frontrunner 2022 Participant Posts: 2 Partner

Name:

Aman Jain
Arun Shastri
Sandeep Varma
Anupam Awasthi
Anoop Tripathi
Subbiah Sethuraman
Hemangshu Agarwal
Aditya Sharma
Rahul Sahu
Nikhil Parmar
Pratik Chaudhari
Bharat Sharma
Abhinav Singh

Country: India

Organization: ZS

ZS is a management consulting and technology firm focused on transforming global healthcare and beyond. We leverage our leading-edge analytics, plus the power of data, science, and products, to help our clients make more intelligent decisions, deliver innovative solutions and improve outcomes for all. Founded in 1983, ZS has more than 12,000 employees in 35 offices worldwide. To learn more, visit https://www.zs.com/.

Awards Categories:

  • Value at Scale
  • Most Impactful Transformation Story
  • Partner Acceleration

Business Challenge:

ZS partnered with a life sciences and pharmaceutical organization to help orchestrate 100+ data science models projected to bring in annual incremental growth of USD$200M+. They had a clear roadmap, the correct set of people, and the right technology in place to develop state-of-art data science models but lacked the expertise and thought leadership to effectively operationalize and maintain these models in production.

There were instances where data scientists reported challenges with the computation power of the existing platform. Loading python libraries took two to three weeks, making it extremely difficult for the Data Scientists to scale the POC ML (Machine Learning) models. Typically, it took over three months to successfully deploy the model.

This sparked discussions on how to make things better across teams by bringing in greater efficiency and industry best practices. A parallel capability was needed to maximize the benefits of investments in AI.

MLOps (Scaled AI) came into the picture when Dataiku was chosen as the central platform for all model deployments. ZS Scaled AI, ESCoE, and ADS teams worked in conjecture with the Dataiku team (including field engineers) to verify infrastructure architecture and understand the capability roadmap of the platform.

Today, over 10 models have been deployed on the Dataiku platform with added MLOps utilities like model health monitoring, data drift analysis, and many more.

Business Solution:

ZS helped the client compare multiple platforms before Dataiku was chosen to be the platform for MLOps. The decision was taken based on the following points:

  • Dataiku was a user-friendly option for multiple personas, including data scientists, business analysts, and domain subject matter experts.
  • Enhancements offered by Dataiku in Dataiku 10 over Dataiku 9, such as ranger policies, were of the utmost importance.

The team later added a couple of additional steps to make Dataiku more powerful for cases such as:

  • Powering it with EKS (Elastic Kubernetes Service) based infrastructure to provide scalable compute per the development team's need while providing support for GPU for more advanced implementations, making the platform future-proof for newer capabilities.
  • Integrating it with Azure DevOps for better code management, conflict resolutions, and so on.
  • Integrating SonarQube within the CI pipelines for setting configurable standards for code quality management and improvement recommendations.
  • Adding custom python-based model performance monitoring, drift monitoring modules, and so on.

More than 50 people across multiple teams and various domains collaborated to build a platform estimated to be used by over 1,000 people across the organization once a 100% solution is deployed. Below is a detailed infrastructure diagram to show how the team leveraged Dataiku to its fullest and how it is core to MLOps' capability.

image.png

Business Area: Analytics

Use Case Stage: In Production

Value Generated:

Direct financial impact:

  1. MLOps platform will help generate USD$300M+ additional revenue for the client's Commercial business.
  2. The model run time had reduced by ~75% post the operationalization of models, which translates to an estimated USD$1M in direct cloud service spending – plus, it has an environmental advantage since the carbon footprint is reduced.

Indirect impact:

The lean structure of the platform makes it easy to manage post 100% deployment. By 2023 we estimate a lean team of 3FTEs managing over 100 models, which today is managed by a team of 25+FTEs across the organization. The estimated efficiency gain of ~90% (Indirectly a significant impact – given we might end up replacing the current admin work).

Below are a few details on how we achieved such process efficiency:

1. Platform Automation: Earlier, the model was scored manually by running individual script(s) monotonous and repetitively. ZS saw this as an opportunity and stitched the entire Dataiku recipe flow into a one-click-based execution; it orchestrates the entire process at the click of a single button. We even introduced automated schedulers that trigger the model training using the latest data, validate model accuracy, execute the inferencing, and provide the notification to the stakeholders. Thus, providing the team with end-to-end visibility into the model lifecycle.

image.png

2. Model Runtime improvement: Earlier manual execution models involved multiple handoffs and inefficiencies because they were shared as ZIP files for deployment. By enabling One Click Execution of model prediction and One Click Deployment leveraging Azure DevOps, our team reduced model runtime from 48 hours to as little as six hours, thereby drastically reducing the model go-to-market time from six months to six weeks.

image.png

Value Brought by Dataiku:

Before Dataiku was chosen as the platform for development, data scientists used multiple tools and technology stacks for ad hoc development, including Jupyter Notebook, RStudio, LiveRamp, and more. This distributed technology stack limited the scope of collaboration and model reusability across the teams.

Our team brought in Dataiku as a centralized platform for development and was able to portray its value from the very beginning. With the ease of development using a graphical UI-based implementation, data scientists could easily adapt and migrate to the platform. As a result of this, the utilization/productivity of the data science team went up by almost 40%.

Dataiku also provides some out-of-the-box capabilities which added value at various stages:

  1. Collaboration: Data scientists can work on the same project collaboratively with logically separate zones to speed up the development. It supports changing the underlying compute and configuration for executions, tracking, and maintaining lineage with logs about changes executed.
  2. Integrations: Dataiku provides support for Python, R, Spark, and so on, while providing customization for coding environments according to project needs. The capability to create reusable functions with different programming languages in libraries and the flexibility to create the ensemble algorithms per the project requirements is truly empowering for the data science teams.
  3. Security and Governance: It supports enterprise-level security managing access to data and data projects, including support for user impersonation for full traceability and compliance. Dataiku also brings the ability to centralize the data effort, allowing for model risk validation and processes to scale, thanks to a single end-to-end system housing data projects from data ingestion to deployment in production.
  4. Upskilling: We extensively leveraged Dataiku documentation and Academy to understand different tools, functionalities, and implementations. Our team is even certified in various tracks of Dataiku Certification to build credibility with the knowledge gained.

Model and Data monitoring dashboards:

image.png

image.png

Value Type:

  • Improve customer/employee satisfaction
  • Reduce cost
  • Reduce risk
  • Save time

Value Range: Dozens of millions of $

Setup Info
    Tags
      Help me…