Financial Services Institution - Efficient Deployment of Compliance Models to Support Field Teams

CH007 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 5

Name: Christie Hampton, Machine Learning Engineer

Country: United States


Wealth management services provide portfolio management, investment advice and research services to support individuals with achieving their desired financial goals. My current employer (The Firm) is currently ranked amongst the top Fortune 500 as a leading financial services institution and is committed towards improving the lives of its clients, associates and communities that it serves.

Awards Categories:

  • Best Acceleration Use Case
  • Best MLOps Use Case
  • Best Approach for Building Trust in AI
  • Best ROI Story

Business Challenge:

The Firm places a strong emphasis on securing and frequently monitoring more than 8 million client's sensitive data, confidential records, and frequent trade transactions. The Firm is also committed towards implementing data governance standards, promoting ethical and explainable machine learning and AI solutions and enforces daily compliance checks on all data assets to prevent market manipulation, assess risk, and prevent clients from potential financial exploitation.

With millions of financial transactions generated each day across client accounts, it’s imperative that Compliance teams within The Firm rely on the deployment of numerous machine learning models in production (real-time, daily, and monthly batch scoring) to identify and predict indicators of potential account-level risk to support more than 125+ Field Supervision Agents with their hourly, daily, or monthly compliance checks.

Some of The Firm's core challenges towards deploying compliance models are outlined below:

  • Historically, data scientists within The Firm leveraged several desktop open-source solutions for building compliance machine learning models and often faced challenges with deploying, governing, and implementing performance monitoring for these models at scale.
  • The time to optimize the model code and deploy some of these complex open-source models leveraging data from various source systems required nearly 1,440 standard deployment hours from tech support engineers (nearly 8 months) to deploy them into production.
  • With the increased time it took to deploy ML models residing on desktop solutions, this led to low return-on-investment for generating continuous value from compliance ML models.
  • With compliance models residing within desktop open-source solutions, this made it challenging for MLOps teams to fully embrace Agile and DevOps methodologies regarding version control and for building end-to-end continuous integration and continuous deployment CI/CD automated machine learning pipelines.
  • Models built on desktop solutions made it difficult to encourage cross-functional project collaboration between data scientists and machine learning engineers.
  • Additionally, daily batch scoring compliance ML models require the development of complex end-to-end pipelines, these classification models are required to generate batch scores daily across all client accounts. Each day's prediction scoring dataset must be compared against the prior day's prediction scoring dataset to assess a specified % change difference or detect the presence of a new account. These conditional pipelines are designed to perform specific actions based on the threshold levels and % change results detected within the model output. Building out these conditional pipelines to support compliance teams was not possible with models residing within desktop solutions.

    Figure 1.0 - Ex. of Stakeholder Requested Batch ML Pipeline

Business Solution:

During Spring of 2022, The Firm developed an MLOps team to collaborate with data scientists and work with various cross-functional departments supporting the end-to-end deployment processes of operationalizing machine learning models within Dataiku. As a Machine Learning Engineer, below I have outlined some of the initiatives that were undertaken to address the prior challenges experienced with deploying compliance ML models.

Dataiku Solution #1 – Implementing Agile Model Deployment Techniques Leveraging Dataiku

  • MLOps processes and standardized playbooks were developed based on the release of Dataiku's v.10 and v.11 capabilities. More than 164 new functionalities within these releases are routinely utilized by ML Engineers within the team, with many of these functionalities incorporated within our team's standard operating procedures.
  • Dataiku allowed ML engineers to successfully migrate compliance models from prior desktop solutions into Dataiku. ML Engineers leveraged Dataiku's GitHub connectivity to perform all code migrations, which allowed compliance models to be rebuilt within this enterprise solution for proper versioning and project governance.
  • Dataiku has allowed me to significantly reduce the production time it takes to optimize model code and build CI/CD pipelines for production deployment. I and other ML Engineers within the team have been able to reduce the time to deployment by nearly 90% compared to the prior desktop solutions model deployments.
  • With the improved efficiency in the rate of deploying compliance models within Dataiku, business stakeholders have been able to receive significant business value from models moved into production.


Dataiku Solution #2 – Utilizing the Govern Node

  • To align with the data governance processes established within The Firm, the MLOps team began utilizing Dataiku's Govern Node. The Govern Node is utilized in multiple capacities:
  • The tool provides visibility into all compliance models that were previously deployed and the MLOps team can monitor the performance evaluation metrics of all models.
  • The Govern Node is also utilized for obtaining stakeholder sign-off and for storing all model documentation.
  • With several dozens of models deployed within production, the Govern Node's model registry capabilities are essential for managing the entire portfolio of compliance models deployed into production.Figure 4.0 - Utilizing the Govern Node to Implement Governance

Dataiku Solution #3 – Conditional Pipeline Development to Support Compliance

  • Utilizing Dataiku's python recipes and leveraging the Dataiku API, I'm able to write python scripts for customizing conditional pipelines and controlling the flow of data between managed folders. This allows me to satisfy the specifications for comparing daily batch scoring results against prior batch scoring results to adhere to compliance standards.Figure 5.0 - Leveraging Python Recipe to Build Customized Conditional PipelinesFigure 6.0 - MLOps Existing Deployment Infrastructure

Day-to-day Change:

Dataiku has played a tremendous role within scaling our MLOps practice, The Firm's MLOps team was scaled around deploying models leveraging Dataiku. Below are some additional examples of how Dataiku has changed our day-to-day operations:

  • Leveraging Dataiku allowed The Firm to stand-up a new MLOps team within a few short months and hire staff ready to deploy ML models after several days of researching technical documentation, Dataiku knowledge tutorials, etc.
  • Leveraging Dataiku's visual recipes and GUI interface enables ML Engineers to deploy end-to-end pipelines while writing up to 75% less pipeline production code, this has also contributed to the increased deployment of compliance models.
  • Dataiku has significantly enhanced our abilities as engineers to implement self-model explainability techniques within our projects. Leveraging the visual interface of the project flow and creating KPI performance metrics within dashboards has equipped our stakeholders and business customers with the abilities to easily interpret, understand and review machine learning models.
  • Implementing tools such as the Govern Node has allowed our team to manage a portfolio of several dozen machine learning models in production, while encouraging peer reviews and stakeholder signoffs.
  • Leveraging Dataiku has enabled our Firm to create a Dataiku user group (DUG), which brings users from various spokes teams throughout The Firm together to encourage collaboration amongst various team members.Figure 7.0 - My Day -to-Day Operational Journey

Business Area Enhanced: Risk/Compliance/Legal/Internal Audit

Use Case Stage: In Production

Value Generated:

Leveraging Dataiku as The Firm's end-to-end model deployment solution has allowed engineers to speedily provide the business community (more than 125+ stakeholders) with an opportunity to deliver mission critical workloads at scale for detecting potential risk, improving informed decision making, and reducing the amount of manual effort generated by Field Supervision Agents.

Dataiku has enabled ML Engineers to increase The Firms' overall efficiency rate for deploying compliance models into production by more than 900%.

The standard deployment workload hours for The Firm drastically decreased from 1,440 hours of deployment work (prior to leveraging the MLOps team to operationalize models within Dataiku), down to less than 160 hours for compliance model deployments.

ML Engineers have experienced more than an 86% reduction in the amount of time spent optimizing and refactoring model code for production by leveraging much of dataiku's visual ML capabilities and the level of effort towards deploying various compliance use cases has also decreased by nearly 66%.

Value Brought by Dataiku:

Dataiku is a quintessential analytics solution which encourages cross-functional team collaboration between analysts, scientists, engineers, and other technical professionals. The tool provides users with the capabilities to create and deploy just about any analytical or AI solution using its highly intuitive visual interface or by coding within various programming languages leveraging code recipes. The solution places a strong emphasis of being able to scale analytics or data science workloads quickly and efficiently from inception throughout production, with additional capabilities outlined below:

  • End-to-End Platform Solution – The tool can be utilized for computing classical statistical techniques or scale significantly to support NLP or LLM workloads.
  • Explainable AI – Dataiku is committed towards offering users with explainable AI solutions. All machine learning models generated within the tool provide easy to interpret metrics, metadata, or explainable reading tips or links to additional guidelines.
  • Model Evaluation – There are numerous ways both visually or through code to either evaluate or display the performance of machine learning models. Users can create interactive web apps, publish model output into dashboards, static insight reports, or export datasets or model results through various 3rd party solutions.
  • Governance – Dataiku provides several capabilities to enforce project governance. Users can implement The Govern Node to ensure stakeholder feedback and the appropriate signoffs are achieved prior to model deployment. The Govern Node allows users to monitor all governed or ungoverned dataiku projects in production.
  • MLOps – Dataiku provides numerous capabilities for building robust end-to-end CI/CD pipeline deployments and supports all MLOps use cases.
  • Knowledge Sharing – Dataiku provides users with the opportunity to enroll within one of several academy programs to upskill themselves in either data analytics, data science or MLOps. The solution provides a plethora of tutorials and documentation describing step-by-step instructions for utilizing Dataiku everyday.

Value Type:

  • Improve customer/employee satisfaction
  • Increase revenue
  • Save time
  • Increase trust

Value Range: Dozens of millions of $

Setup Info
      Help me…