Survey banner
Share your feedback on the Dataiku documentation with this 5 min survey. Thanks! TAKE THE SURVEY

Roche - Improving Patient Safety by Predicting Data Monitoring Needs on Clinical Studies

Team members:

  • Data Science business partners – IDMC Specialists
    • Dee Di-Tommaso
    • Shaylea Rowlett
    • Kalani Kotrys
    • Omar Hernandez
  • Data Science Business partners – Business Predictive Analytics
    • Nicolas Delporte
    • Brian Tung
  • Clinical operations analytics lead
    • Danah Albaaj

Country: United States

Organization: Roche

Roche is a global pioneer in pharmaceuticals and diagnostics focused on advancing science to improve people’s lives. The combined strengths of pharmaceuticals and diagnostics under one roof have made Roche the leader in personalized healthcare – a strategy that aims to fit the right treatment to each patient in the best way possible.

Founded in 1896, Roche is the world’s largest biotech company, with truly differentiated medicines in oncology, immunology, infectious diseases, ophthalmology, and diseases of the central nervous system. Roche is also the world leader in in-vitro diagnostics and tissue-based cancer diagnostics and a frontrunner in diabetes management.


Awards Categories:

  • Best Acceleration Use Case
  • Best MLOps Use Case
  • Best Moonshot Use Case
  • Best Approach for Building Trust in AI


Business Challenge:

Genentech Inc., a member of the Roche group, is currently developing a business predictive analytics capability to democratize data science across the organization. This capability aims to help groups and functions with unmet data analytical needs that may not have the necessary bandwidth, knowledge, and resources. Our goal is to establish a framework that jumpstarts data analysis and enables self-service, allowing these groups to unlock insights from their data. To support this goal, we have selected Dataiku as a key component, based on its ability to be leveraged by business and operations users with non-technical backgrounds.

During our work, we collaborated with the team responsible for setting up and managing Independent Data Monitoring Committees (IDMC). An IDMC is formed to monitor identified potential risk to patient safety in Roche clinical study protocol data as part of Roche’s approval process with regulatory authorities. The IDMC team is responsible for identifying key opinion leaders to form the committee, and establishing it no later than the first data review following the first patient dose. Forming an IDMC is typically required about 20% of the time for the roughly one hundred new studies conducted every year. Despite experience and established guidelines, anticipating when an IDMC request will occur can be challenging.

To address this issue, we have proposed creating a predictive model based on a set of study criteria that can determine the need for an IDMC. This model can help streamline the process and ensure that committees are established in a timely and effective manner.


Business Solution:

  • Training the model

We began the IDMC Predictive model project by collecting relevant historical study data from various siloed systems, spanning over a decade, including master data management, clinical trial management, R&D planning, and operational reporting. By leveraging Dataiku, we were able to consolidate all of this information into a single data flow, engineer the necessary features, and train the model. On top of centralizing information, this allowed us to develop strategies to approximate missing values, based on other pieces of data information, when standard attributes such as patient enrollment numbers or First Patient, First Dose dates were not supplied.

Certain relevant attributes couldn't be leveraged due to insufficient historical data. As a next step, we could envision using NLP capabilities to derive this information from protocols. We are also considering the use of external data sources. Currently, the model is built exclusively on Roche data, but we could incorporate external data sources and  information from health authorities such as While we anticipate facing similar missing data issues, we are confident this would help improve the model.

  • Performance monitoring

To minimize the potential of modeling overfitting error, we decided to use the end of the year 2021 as the cut-off date between the training and the test dataset. We maintained a list of recently activated studies after 2021 and showed what the model would predict based on the most up-to-date information and identify if there were any false predictions. This approach has been instrumental in refining the behavior of the model, as well as transparently demonstrating its performance to the users. Moving forward, we envision using a rolling cut-off date for the training set to regularly update the model with more historical data.


Day-to-day Change:

As of this writing, the model is in hyper-care mode and rapidly progressing toward the Minimal Viable Product (MVP) stage. While not fully automated and rolled out to production, the IDMC team is nonetheless already leveraging the output of the model, thanks to real-time predictions. They pull upcoming study records with IDMC predictions published to a Tableau Dashboard. Those predictions are ranked by probability assessment, based on the information captured in the early development stages of the study. This resulting dashboard view acts as a sort of radar that allows the team to narrow down the number of potential IDMCs and optimize their workload.

As the model continues to perform well and demonstrate its accuracy, it has gained more and more publicity. Because the quality of the predictions depends directly on the quality of the information entered by other parties, such as project management and clinical operations, this project has also provided an excellent real-life example of promoting data citizenship in creating more value from data entered into systems.


Value Generated:

The IDMC Predictive Model powerfully demonstrates to non-technical users the value of data science tools and technologies. The IDMC Team can now maintain and use the model independently, without having to rely heavily on informatics resources.

This solution also promotes data citizenship and highlights the importance of generating business data that can be reused across teams, as well as leading the organization towards flexible construction of data flows that can later be stabilized as enterprise data layer standards.

Most importantly, this approach has the potential to expand beyond Roche’s internal scope, with a vision of collaborating with other pharmaceutical companies and even health authorities. By gathering information from various sources across the industry, our work could help ensure consistency with health authority guidelines for all studies and ultimately benefit patient safety.


Value Brought by Dataiku:

Dataiku proved to be quite effective for developing the IDMC Predictive Model, especially due to its user-friendly interface. It made the data modeling process graphically accessible to non-technical data users, which is a key aspect in Roche's own data democratization efforts. The platform's three seamlessly integrated components - a comprehensive non-coding data preparation layer, advanced visualization tools, and powerful AutoML functionality - made it easy to develop the predictive model without requiring extensive technical expertise nor additional infrastructure.

The graphical representation of the data flow in Dataiku was particularly useful in engaging with subject matter experts and determining how to transform, standardize, and harmonize the data. This allowed for a more collaborative approach to the project and ensured that the final model met Roche's needs.

This project served as a flagship to showcase the platform's capabilities and encourage other Roche employees to adopt it. Overall, Dataiku proved to be a valuable asset to Roche's data-driven initiatives.


Value Type:

  • Reduce risk
  • Increase trust

The purpose of IDMCs is patient safety. A robust IDMC predictive model ensures consistency in how health authorities safety guidelines are being applied. This reduces risk for the patients and increases trust in the process of determining the need for such committees.


Great work, everyone! This is absolutely impressive. Many won't appreciate the effort that went into consolidating data from siloed and heavily guarded sources, especially in one of the most regulated industries. Not to mention securing data owners' and stakeholders' interests and engagement to achieve such great results. For that, this goes far beyond merely adding value; it instills a data-first culture powered by tangible, real-life value that users can see and touch.

Again, absolutely amazing! 👏👏👏👏

Version history
Publication date:
06-08-2024 09:05 AM
Version history
Last update:
‎09-06-2023 09:24 AM
Updated by: