Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

The Ocean Cleanup - Leveraging Data Science Across the Organization to Multiply Positive Impact


Bruno Sainte-Rose, Lead Computational Modeler
Axel Peytavin, Computational Modeler
Yannick Pham, Computational Modeler 

Country: Netherlands

Organization: The Ocean Cleanup, Stichting

Every year, millions of tons of plastic enter the oceans primarily from rivers. The plastic afloat across the oceans – legacy plastic – isn’t going away by itself. Therefore, solving ocean plastic pollution requires a combination of stemming the inflow and cleaning up what has already accumulated. The Ocean Cleanup, a non-profit organization, designs and develops advanced technologies to rid the world’s oceans of plastic by means of ocean cleanup systems and river interception solutions.


Awards Categories:

  • Most Impactful Ikigai Story


Business Challenge:

At The Ocean Cleanup, data science needs not only to be applied to understand the plastic pollution problem in rivers and oceans and to develop the technical solution to rid the oceans of plastics, but also to maximize opportunities for funding and sustaining the broader organization.

We started using Dataiku in 2018, as we were facing numerous data science challenges:

  1. Data pipelines management
  • We were first lacking an adequate tool to manage data processing pipelines that would allow for ad-hoc data updating and processing with an optimal computing time.
  • Some of the data that we were manipulating was faulty (in part because of satellite transmissions shortening messages) and we were missing a tool to have a quick scan through the data, in order to elaborate the right approach to correct it.
  • We were missing a tool to automate the updating of our pipeline, especially accounting for specific triggers, but also allowing for dashboarding and reporting options.
  1. Handling different formats and types of data
  • The data that we manipulate can be structured and unstructured, comes in various formats from different providers. Consequently, being able to handle at the same time single .csv files, databases, retrieving the data from built-in and/or provider specific API was compulsory.
  • Along with the diversity of data that we manipulate in form, we also manipulate data of different nature (scientific measurements, geospatial information, natural language, financial data, etc.), which also calls for a versatile data science processing solution.
  1. Lack of centralized platform
  • All the AI/Machine learning frameworks that we came across were not very user-friendly and required too much expertise to be promoted internally.
  • Finally, we were looking for a collaborative data science platform to allow for multiple users, with specific roles/rights/access.


Business Solution:

Thanks to Dataiku, we were able to address a big part of the aforementioned challenges. In the frame of the program, we were given access to both Dataiku as a platform at a company-wide level, but also to Dataiku staff and expert-knowledge to support the implementation of our data science projects.

  1. Empower people across the organization to gain insights

Having access to Dataiku allowed us to ramp up our Data Science analysis. The user-oriented, code-minimalistic approach provided by the Dataiku pipeline was a game changer both for our data pre-processing and post-processing steps, the extensiveness of built-in operations to manipulate and prepare the data made it possible for less programming-savvy staff to perform their usually very time-consuming operations, as such they felt like using a steroid-powered Excel! The built-in GIT version controlling, and the logging of each individual operations allows for a readable and sustainable project approach.

The collaborative environment, and the overall user experience, allowed for a company-wide adoption. More than thirty people in the organization participated in the projects built in Dataiku, with a dozen recurrent users which are developing their own projects (out of ~100 people in the whole organization) spanning the Research, Rivers, Oceans, Valorization and Communications department. As part of the Ikigai program for nonprofits, Dataiku provided support to enable users on the platform through trainings, projects co-development, and support. The Dataiku staff also contributed directly to develop some of the most innovative projects, including Emilie Stojanowski, Matthieu Scordia, Paul Hervet, Jay Narhan, Gonzalo Betegon, Jacqueline Kuo, Cassandra Suljian  among others.

  1. Leverage & reuse data pipelines and features to save time

Regarding the product itself, Dataiku helped us better manage our data pipelines, so as to track what has been done and leverage accomplishments for future projects. As an example, when we first started using Dataiku for the testing of our barriers in November 2018, less than a year later we could easily replicate the workflow for a new test campaign – leveraging these new efficiencies to spend more time developing features. In November 2020, during a campaign in the North Sea, our engineers only went through a quick Dataiku training to be able to reuse the previous data pipelines and features, to focus their time on where they can add most value. The recent flow zones allowed us to capture the overall operational pipeline of our current operation in one flow, which started in 2021, as well as the data partitioning allowed us to speed up the updating of the pipelines.

  1. Adapt to different data (in format & type) and use cases as we expand

Thanks to its great versatility, Dataiku enabled us to connect to many different data formats, APIs, plugins, etc. This is paramount as we are handling different data in nature as well, and the platform capabilities are key to adapt. For instance, visual pre-treatment features allow us to identify when satellite data is cut without needing to complete the entire preparation process, and filter this out – which saves much-needed time and resources for our nonprofit organization.


Value Generated:

With Dataiku, The Ocean Cleanup has leveraged data science for a myriad of different applications – as many different ways to increase our positive impact on the world by a multiplier effect:

  • Since its inception The Ocean Cleanup has been implementing data science projects in order to clean so far over 150,000 kgs of plastics from the Great Pacific Garbage Patch and 1,500,000 kgs from 7 rivers across the world with our oceans and rivers solutions. We are now at the scaling phase where we aim to extract 100,000,000 kgs from the Great Pacific Garbage Patch and address the 1000 most polluting rivers across the world.
  • Save time, e.g. our team was 5 times quicker to roll the new campaigns of 2020 and 2021, through reusing the workflows already built in Dataiku for the 2018 and 2019 campaigns, in just a few clicks.
  • Increased donor engagement via timing optimization of social posts and outreach, resulting in an increase in funding,
  • Built database of beach cleanup to be shared in a scientific publication to increase global awareness and, in practice, better understand shore cleaning process to optimize the actions of organizations around the world – as well as entice them to contribute to the database, building a virtuous circle.  

Some projects have been co-developed between Dataiku and The Ocean Cleanup such as the beach cleanup project and plastic density heterogeneity projects. We also had the opportunity to have Dataiku staff work on a project in the frame of an internal Hackathon on June 6th. This initiative greatly contributed to improve our status-quo and steer us towards promising directions.


Value Brought by Dataiku:

As a non-profit organization, our main Key Performance Indicator is the quality/time ratio of the tools that we are using. In other words, our biggest objective is to have the most reliable yet versatile data science platform to efficiently conduct data projects and create the biggest impact across the organization. 

Dataiku enabled us to dramatically improve this KPI through different levers:

  1. Improved operational efficiencies to focus resources on innovation

Before, we had to extract data from SQL databases, aggregate them, build interpolations, etc. with a combination of Excel, Matlab and Python. Dataiku enabled us to centralize the whole workflow, while all practitioners can work with the technologies they’re used to – making us move faster and going further into the most innovative parts. 

  1. Gather everyone on the same platform for quicker decision-making

Thanks to the visual interface, Dataiku enables both technical and non-technical stakeholders to understand the data workflow and the success metrics of projects developed. This is enabling us to make quicker decisions, for instance regarding the success of specific campaigns, and adjust on the go to meet our goals.

  1. Easy onboarding to bring in more people to better fit projects’ needs

The user-friendly interface makes it easy to onboard new people into the platform. The learning resources, as well as the catalog of events and content, give people a vast perspective on data science topics. Our core data science team is aided by 5 times more people across the organization who have been given access on a temporary basis to bring their expertise on various projects.

  1. Visual “recipes” enable everyone to bring in their skills and shorten the time-to-insight

The visual features for data wrangling and visualization enable everyone to contribute their skills to successfully conduct data projects. Even the project managers and director can draw insights from the data at hand.

  1. Democratize data science through a versatile, all-in-one platform

Building upon all levers described above, Dataiku’s biggest impact lies in the. democratization of data science across the organization. From the original technical testing project, we’ve expanded usage to finance (understanding fundraising dynamics to increase the impact of our campaigns) and communications (optimizing social media content and timing to maximize visibility, and therefore improve fundraising abilities).

Through bringing together everyone on the same platform and the rise of “citizen data science”, Dataiku enabled us to embed data science across the organization to create more value toward fulfilling our mission.

Version history
Publication date:
12-09-2022 06:01 PM
Version history
Last update:
‎09-12-2022 08:01 PM
Updated by: