Bruno Sainte-Rose, Lead Computational Modeler
Axel Peytavin, Computational Modeler
Yannick Pham, Computational Modeler
Organization: The Ocean Cleanup, Stichting
Every year, millions of tons of plastic enter the oceans primarily from rivers. The plastic afloat across the oceans – legacy plastic – isn’t going away by itself. Therefore, solving ocean plastic pollution requires a combination of stemming the inflow and cleaning up what has already accumulated. The Ocean Cleanup, a non-profit organization, designs and develops advanced technologies to rid the world’s oceans of plastic by means of ocean cleanup systems and river interception solutions.
At The Ocean Cleanup, data science needs not only to be applied to understand the plastic pollution problem in rivers and oceans and to develop the technical solution to rid the oceans of plastics, but also to maximize opportunities for funding and sustaining the broader organization.
We started using Dataiku in 2018, as we were facing numerous data science challenges:
Thanks to Dataiku, we were able to address a big part of the aforementioned challenges. In the frame of the Ikig.ai program, we were given access to both Dataiku as a platform at a company-wide level, but also to Dataiku staff and expert-knowledge to support the implementation of our data science projects.
Having access to Dataiku allowed us to ramp up our Data Science analysis. The user-oriented, code-minimalistic approach provided by the Dataiku pipeline was a game changer both for our data pre-processing and post-processing steps, the extensiveness of built-in operations to manipulate and prepare the data made it possible for less programming-savvy staff to perform their usually very time-consuming operations, as such they felt like using a steroid-powered Excel! The built-in GIT version controlling, and the logging of each individual operations allows for a readable and sustainable project approach.
The collaborative environment, and the overall user experience, allowed for a company-wide adoption. More than thirty people in the organization participated in the projects built in Dataiku, with a dozen recurrent users which are developing their own projects (out of ~100 people in the whole organization) spanning the Research, Rivers, Oceans, Valorization and Communications department. As part of the Ikigai program for nonprofits, Dataiku provided support to enable users on the platform through trainings, projects co-development, and support. The Dataiku staff also contributed directly to develop some of the most innovative projects, including Emilie Stojanowski, Matthieu Scordia, Paul Hervet, Jay Narhan, Gonzalo Betegon, Jacqueline Kuo, Cassandra Suljian among others.
Regarding the product itself, Dataiku helped us better manage our data pipelines, so as to track what has been done and leverage accomplishments for future projects. As an example, when we first started using Dataiku for the testing of our barriers in November 2018, less than a year later we could easily replicate the workflow for a new test campaign – leveraging these new efficiencies to spend more time developing features. In November 2020, during a campaign in the North Sea, our engineers only went through a quick Dataiku training to be able to reuse the previous data pipelines and features, to focus their time on where they can add most value. The recent flow zones allowed us to capture the overall operational pipeline of our current operation in one flow, which started in 2021, as well as the data partitioning allowed us to speed up the updating of the pipelines.
Thanks to its great versatility, Dataiku enabled us to connect to many different data formats, APIs, plugins, etc. This is paramount as we are handling different data in nature as well, and the platform capabilities are key to adapt. For instance, visual pre-treatment features allow us to identify when satellite data is cut without needing to complete the entire preparation process, and filter this out – which saves much-needed time and resources for our nonprofit organization.
With Dataiku, The Ocean Cleanup has leveraged data science for a myriad of different applications – as many different ways to increase our positive impact on the world by a multiplier effect:
Some projects have been co-developed between Dataiku and The Ocean Cleanup such as the beach cleanup project and plastic density heterogeneity projects. We also had the opportunity to have Dataiku staff work on a project in the frame of an internal Hackathon on June 6th. This initiative greatly contributed to improve our status-quo and steer us towards promising directions.
As a non-profit organization, our main Key Performance Indicator is the quality/time ratio of the tools that we are using. In other words, our biggest objective is to have the most reliable yet versatile data science platform to efficiently conduct data projects and create the biggest impact across the organization.
Dataiku enabled us to dramatically improve this KPI through different levers:
Before, we had to extract data from SQL databases, aggregate them, build interpolations, etc. with a combination of Excel, Matlab and Python. Dataiku enabled us to centralize the whole workflow, while all practitioners can work with the technologies they’re used to – making us move faster and going further into the most innovative parts.
Thanks to the visual interface, Dataiku enables both technical and non-technical stakeholders to understand the data workflow and the success metrics of projects developed. This is enabling us to make quicker decisions, for instance regarding the success of specific campaigns, and adjust on the go to meet our goals.
The user-friendly interface makes it easy to onboard new people into the platform. The learning resources, as well as the catalog of events and content, give people a vast perspective on data science topics. Our core data science team is aided by 5 times more people across the organization who have been given access on a temporary basis to bring their expertise on various projects.
The visual features for data wrangling and visualization enable everyone to contribute their skills to successfully conduct data projects. Even the project managers and director can draw insights from the data at hand.
Building upon all levers described above, Dataiku’s biggest impact lies in the. democratization of data science across the organization. From the original technical testing project, we’ve expanded usage to finance (understanding fundraising dynamics to increase the impact of our campaigns) and communications (optimizing social media content and timing to maximize visibility, and therefore improve fundraising abilities).
Through bringing together everyone on the same platform and the rise of “citizen data science”, Dataiku enabled us to embed data science across the organization to create more value toward fulfilling our mission.