Want to Stop Rebuilding "Expensive" Parts of your Flow? Explicit Builds are the Answer!READ MORE

AstraZeneca - How Dataiku is Helping Enable Self-service AI and Data Analysis Across the Enterprise


Alice Smith, AI Platform Engineer
Hisham Jafar Ali, Citizen Data Science Lead
Callum Connolly, Business Analyst
Bhanuprathap Kari, Senior Consultant
Kathiresan Anandhan, Engineer
Shrinidhi Venkataraman, Associate Engineer
James Harpin, Senior Solution Architect
Karthik Ramasamy, Assistant Manager
Curtis Scholey, Digital and Technology Solutions Apprentice
Haseeb Ahmed, Technology Innovation Apprentice
Jane Amirthanayagam, AI Engineer

Country: United Kingdom
Organization: AstraZeneca

AstraZeneca is a global pharmaceutical company with a major UK presence. Our purpose is to push the boundaries of science to deliver life-changing medicines. The best way we can help patients is to be science-led and share this passion with the scientific, healthcare, and business communities of the UK.

Awards Categories:
  • Most Impactful Transformation Story
  • Excellence in Research


Business Challenge:

AstraZeneca has never attempted to solve the full landscape of data pipeline, machine learning, and data visualization within a single tool due to the inherent complexity required in building and maintaining the broad spectrum of capabilities that would be required.

Due to this, the lifecycle of a project from data wrangling through cleaning, manipulation, data science, visualization, and deployment could see a user working across multiple tools and platforms for each stage of their pipeline.

This process causes a challenge for AstraZeneca, as it increases the following aspects:

  • Cost of technology with the need to continuously increase the catalog of tools available for data scientists.
  • Onboarding time for new project members due to the upskilling required for multiple different tools.
  • Time-to-value for the projects due to the lack of a comprehensive platform.

One of AstraZeneca’s core values is We Play to Win. However, in the data science space, we found that we were falling behind. Many teams were using outdated technological systems, without scalability or automation, and collaboration across these systems and teams was non-existent, even when using the same platform.



We noticed a lack of sustainability across all aspects of the data pipeline across the business; repeated models and code snippets, duplicated data stored in multiple sources, and a lack of communication between business areas. Not only does this increase the time, cost, and resources to produce value from a project, but it also negatively impacts AstraZeneca’s goal to be carbon negative by 2030.


Business Solution:

Dataiku has allowed AstraZeneca to provide core Data Science capabilities to 200+ Users and 100+ Data Science Projects and is continually growing. These projects span across R&D, Operations, and Commercial teams at a global level. The users range from experienced data scientists and ML engineers, who are utilizing the automation and deployment aspects of Dataiku, to Business Analysts with no previous experience in data science, who, with the aid of Dataiku, have been able to produce their first ML model.

Dataiku has enabled teams to use a centralized platform to perform all stages of their project lifecycle and has enabled collaboration across teams that was previously not possible. The multi-disciplinary teams at AstraZeneca can now work with a single source that meets the needs of all skill sets, enabling large-scale projects to be completely effective and efficient.

One of the aims with Dataiku is to democratize AI and create a self-service capability that puts the power of AI and analytics into the hands of employees. To enable this, we have created macros for automating all the steps in the project creation process, including group and connection creation, as well as producing project templates and new data connections that were not previously supported.

Our goal is to enable SMEs and non-technical users to produce valuable insights from their data, regardless of their technical ability. For instance, Dataiku has allowed one team to quickly stitch together disparate data sets to create a holistic view of the value chain and rapidly develop predictive forecasting capabilities, which were considering lead time and yield to better understand our ability to fulfill commitments. Dataiku has enabled unparalleled visibility for this project and all upcoming work.


Value Generated:

At AstraZeneca, time to value is a key metric when assessing any project. Due to the nature of drug development, manufacturing, and supply, the speed at which these life-changing drugs can be provided to patients is our most important priority.

Here are some testimonials from our users of their time savings on the platform:

  • We estimate saving 1/4-1/3 of our time by deploying using Dataiku.
  • We decreased implementation time from 126 days to 42 days.
  • Dataiku is helping to reduce time by 2-3 days of previously manual work every budgeting cycle, with the added benefit of accuracy.
  • Using Dataiku to analyze historic data and identify delays saving 50 hours per week across all CMO's.
  • We’re saving 20-40% of time spent in the field by operations teams, 1000s of man-hours saved monthly.
  • A total of 60 days were saved per annum across the team and our projects.

Alongside time savings, cost savings are an important scope for assessing the value of a project. Reduced cost for a single project allows for teams at AstraZeneca to produce more insights and work on more projects within a year.

The testimonials from our users show how Dataiku has allowed users to reduce their project overheads:

  • The analysis for a typical project costs at least $75K. Dataiku has provided an equivalent to at least $20K in saving for one project.
  • Using automation, we have a potential saving of £ 300k p.a.
  • We saved £290k moving from the standard development process.
  • Across the team and their projects for a total saving of €54k per annum in resource fees.



Value Brought by Dataiku:

Using Dataiku for data science across Enterprise at AstraZeneca enabled this value for a multitude of reasons:

1. Connection to multiple data sources in a matter of minutes, enabling more insights

Our users were able to set up the necessary connections to complete their projects in the space of minutes. AstraZeneca utilizes data lakes to enable global access to data, allowing a single connection on Dataiku that could provide multiple users with the data they need to access for their work. Combined with the ability to create new connections using the development plugins, this enabled our teams to quickly and simply cross multiple sources and data stores in a matter of minutes, when previously it would have taken hours, and involved manual data ingestion and manipulation.

2. Self-service capabilities, shortening the time-to-value

One of the business teams using Dataiku within Operations commented that Dataiku is “the only platform that does not require support from IT to work on”. Due to the size of the organization, a challenge that often appears is the time to value to build and manage production-level data science tools. With Dataiku, the power is placed in the hands of the users, our onboarding process takes ~30 minutes, and once users are on the platform, they can build a pipeline from data ingestion to automation, including building code environments and creating visualizations, without any admin or IT involvement needed.

3. Improved version control and governance to foster innovation

Additionally, Dataiku has enabled improved and quicker version control and governance for all projects on the platform. Using Dataiku’s integration with BitBucket, we have created a pipeline to review and analyze plugins using SonarQube. This allows for faster approval from the AstraZeneca security (from a month down to a day) while also allowing users to create their own development plugins within a guard-railed process.

We have also been working with the Dataiku team to implement the Collibra plugin. Users are now able to output their recipes, datasets, and models to Collibra to make them findable and usable by other business areas at AstraZeneca. This allows for a centralized management process, as well as creating reusable and secure data pipelines, with a reduced risk of data loss.

Version history
Last update:
a month ago
Updated by: