ZS Associates - Efficiently Managing Hundreds of Models in Production

Options
amanjainzs
amanjainzs Partner, Registered, Frontrunner 2022 Participant Posts: 2 Partner

Team members:

Aman Jain, Engineering Lead, with:

  • Arun Shastri
  • Anupam Awasthi
  • Sandeep Verma
  • Anoop Tripathi
  • Subbiah Sethuraman
  • Hemangshu Agarwal
  • Aditya Sharma
  • Rahul Sahu
  • Nikhil Parmar
  • Pratik Chaudhari
  • Raghava Krishna
  • Rohit Garg
  • Sriniketh Shankar
  • Abhinav Singh

Country: India

Organization: ZS Associates

ZS is a management consulting and technology firm focused on transforming global healthcare and beyond. We leverage our leading-edge analytics, plus the power of data, science, and products, to help our clients make more intelligent decisions, deliver innovative solutions and improve outcomes for all. Founded in 1983, ZS has more than 12,000 employees in 35 offices worldwide. To learn more, visit https://www.zs.com/.

Awards Categories:

  • Best MLOps Use Case
  • Best ROI Story

Business Challenge:

ZS collaborated with a leading life-science organization to address the challenge of effectively implementing and maintaining multiple ML models in the production environment. While the organization had the right roadmap, the right personnel, and tools for developing cutting-edge models, it still struggled to maximize its value.

Key Challenges:

1. Prolonged Model Deployment: Deploying a model typically took over three months, delaying the realization of model benefits and hindering organizational capabilities to take timely data-driven decisions keeping in mind the market dynamics.

2. Limited Computational Power: Existing platform's slow loading of Python libraries (~2-3 weeks) severely hampered team’s ability to scale POC implementation and hindered impact exploration.

3. Data Science and ML Capability Enhancements: Another challenge that the organization was encountering was the need to enhance its data science and machine learning capabilities including:

  • Computational Power: A scalable and efficient environment for model deployment and orchestration to address the computational power challenges.
  • Code Quality and Reliability: Improve code quality and reliability by detecting bugs, vulnerabilities, and maintainability issues early in the development process.
  • Model Performance and Accuracy: Ensuring the deployed models remained performant and accurate over time was a challenge.
  • Data Drift and Input Changes: Detecting changes in input data patterns and proactively ensuring model accuracy and reliability were important concerns.
  • Effective Use Case Management: Manage and monitor different use cases separately while maintaining governance and scalability.
  • Realtime Pipeline Monitoring: Real-time updates on the status and outcomes of data processing tasks to promptly respond to pipeline failures and minimize downtime.
  • Delivering Data-Driven Insights: Integrate machine learning models and visualizations into web applications to deliver intuitive and interactive data-driven insights and predictions through user-friendly dashboards.

The integration of MLOps using Dataiku as the platform aimed to streamline operations, optimize model deployment, and enhance the organization's overall data science capabilities.

Business Solution:

Dataiku, a central platform for all model deployments, was chosen to anchor the MLOps implementation. Collaborative efforts between ZS and the Dataiku platform team, including field engineers, validated the infrastructure architecture and platform capabilities to align with the organization's requirements and future goals.

Dataiku’s integration with Amazon EKS significantly enhanced the firm's data science and ML capabilities, offering a scalable and efficient environment for model deployment and orchestration. This integration allowed the organization to deploy and manage models seamlessly, addressing the computational power challenge encountered earlier. SonarQube, a widely used static code analysis tool, was integrated into the development pipeline to further enhance code quality and reliability, and enable early detection of bugs, vulnerabilities, and maintainability issues in the codebase.

By early detection of these issues the team was able to deliver higher-quality code, reducing risks associated with code deficiencies. Dataiku's MLOps utilities, like Model health monitoring and Data Drift analysis, contributed to the overall initiative's success. Model health monitoring ensured sustained model performance and accuracy, providing continuous value. Data Drift analysis proactively addressed changes in input data patterns to helping ensure model accuracy and reliability.

Dataiku facilitated predictive analytics across various domains, automating tasks such as sales forecasting, customer churn prediction, demand forecasting, fraud detection, and sentiment analysis. Deploying these models directly from Dataiku, the organization was able to automate and operationalize these predictive analytics tasks, improving decision-making and business outcomes rapidly. Real-time email notifications improved pipeline monitoring, promptly alerting about failures. Containing vital information about pipeline failures, this helps in root cause analysis. The implementation significantly enhanced pipeline monitoring and notification capabilities, ensuring prompt response to issues and minimizing downtime.

Lastly, Dataiku's compatibility with Streamlit enabled seamless integration of ML models and visualizations into web applications, delivering intuitive and interactive data-driven insights and predictions through user-friendly dashboards and web-based data applications. Streamlit empowered users to interact with the models and explore data insights in an intuitive and engaging manner, enhancing the overall user experience. These efforts resulted in more accurate predictions, streamlined processes, and data-driven decision-making, contributing to the organization's annual incremental growth.

Day-to-day Change:

1. Streamlined Data Management: Dataiku's comprehensive platform simplifies data management from various sources, enabling efficient data access and analysis, saving time on data preparation.

2. Enhanced Collaboration: Dataiku's collaborative features foster better communication and coordination among teams, including data scientists, analysts, and decision-makers, resulting in more effective and efficient project execution.

3. Automated Processes: Automating many aspects of data processing and analysis, Dataiku has allowed teams to focus on more strategic tasks, leading to increased productivity and faster insights.

4. Improved Decision-Making: Insights from Dataiku's reports drive effective decision-making backed by data, benefiting our organization's strategies.

5. Increased Agility: Dataiku’s ability to quickly replicate and scale projects has enhanced our organization’s agility in responding to new data needs or changes in the business environment.

Business Area Enhanced: Marketing/Sales/Customer Relationship Management

Use Case Stage: In Production

Value Generated:

With MLOps operation, the organization observed improvements across various dimensions like the model run time had reduced by ~75% post operationalization, which translated to an estimated USD 1M savings in direct cloud service spends – plus it has an environmental advantage too since the carbon footprint gets reduced.

Additionally, the lean structure of the platform enables 100% deployment management. By 2023, a lean team of 3 FTEs is estimated to handle 100+ models, a significant improvement compared to the current team of 25+ FTEs from Enterprise, IT, and Admin teams. This translates to an estimated efficiency gain of ~90% (with potential replacement of current Admin work).

Measures underway to translate the thought into reality:

1. Platform Automation

Earlier the model was scored manually by running individual script(s) in a monotonous and repetitive way. ZS saw this as an opportunity and stitched the entire Dataiku recipe flow into a one-click-based execution; it orchestrates the entire process at the click of a single button. We even introduced automated schedulers which trigger the model training using the latest data, validate model accuracy, execute the inferencing, and provide notification to the stakeholders. Thus, providing the team with end-to-end visibility into the model lifecycle.

2. Model Runtime improvement

Earlier manual execution models involved multiple handoffs and inefficiencies because they were shared as ZIP files for deployment. One Click Execution and Deployment leveraging Azure DevOps significantly reduced model runtime from 48 hours to just six hours. This reduced model go-to-market time from six months to six weeks, enhancing overall efficiency.

3. MLOps Efficiency Gains

MLOps brings efficiency to the ML process through automation, standardization, and collaboration, enabling faster time to market, improved scalability, and optimized resource utilization. It ensures reliable model performance, risk mitigation, and compliance, empowering data scientists to focus on value creation i.e., MLOps enhances ML efficiency, leading to accelerated deployments, streamlined processes, and impactful outcomes.

4. MLOps Workshops for DS

MLOps workshops empower data scientists with knowledge, tools, and best practices to optimize model development and deployment processes. Embracing MLOps principles enhances efficiency, collaboration, and alignment with business goals, enabling faster and more impactful model deployments.
imageimage

Value Brought by Dataiku:

Before Dataiku, Data Scientists used multiple ad-hoc development tools like Jupyter Notebook, RStudio, Python script, etc., limiting collaboration and model reusability. We brought in Dataiku as a centralized platform and it improved team productivity by nearly 40% almost instantly.

With its graphical UI-based implementation, Data Scientists easily adapted and migrated, adding value at various stages. As a result of this, the utilization/productivity of the data science team went up by almost 40%.

Dataiku provides some out-of-the-box capabilities which added value at various stages like

1. Collaboration: Data Scientists can work collaboratively with logically separate zones, speeding up development. It allows changing compute and configuration for executions, tracking lineage, and maintaining logs of executed changes.

2. Integrations: Dataiku supports Python, R, Spark, etc., while customization options for coding environments as needed. Reusable functions with different programming languages in libraries and the flexibility to create ensemble algorithms empower the development teams.

3. Security and Governance: Dataiku supports Enterprise-level security, managing data and data project access, including support for user impersonation for full traceability and compliance. Centralized data efforts, allow model-risk validation and process scaling, thanks to a single end-to-end system housing data projects from data ingestion to deployment in production.

4. Reporting: We leveraged Dataiku’s reporting for gaining insights about the model performance and data health.

5. Improved Decision-Making: Dataiku's comprehensive reports can provide valuable insights that drive data-driven decision-making. This can lead to more effective strategies and improved outcomes.

6. Efficiency: Dataiku automates various aspects of data processing and analysis, which saved essential time and resources leading to faster insights assimilation.

7. Model Management: Utilizing model tracking and retraining, Dataiku maintains clear records of AI model development and usage, enhancing the reliability and effectiveness of AI initiatives. 8. Collaboration: Dataiku’s collaborative environment enabled data scientists, analysts, and decision-makers to work on holistic and effective models and strategies.

9. Upskilling: Leveraging Dataiku documentation and Academy, the team gained expertise in different tools and functionalities, earning Dataiku Certification to build credibility with acquired knowledge.
imageimage

Value Type:

  • Increase revenue
  • Reduce cost
  • Save time

Value Range: Dozens of millions of $

Setup Info
    Tags
      Help me…