The Ocean Cleanup – Empowering Citizen Data Scientists Across the Organization
Name:
Bruno Sainte-Rose
Title:
Lead Computational Modeler
Country:
Netherlands
Organization:
The Ocean Cleanup, Stichting
Description:
Every year, millions of tons of plastic enter the oceans primarily from rivers. The plastic afloat across the oceans – legacy plastic – isn’t going away by itself. Therefore, solving ocean plastic pollution requires a combination of stemming the inflow and cleaning up what has already accumulated. The Ocean Cleanup, a non-profit organization, designs and develops advanced technologies to rid the world’s oceans of plastic by means of ocean cleanup systems and river interception solutions.
Awards Categories:
- Data Science for Good
- AI Democratization & Inclusivity
Challenge:
At the Ocean Cleanup, data science needs not only to be applied to develop the technical solution to get rid of the oceans’ plastics, but also to maximize opportunities for funding and sustaining the broader organization.
We started using Dataiku in 2018, as we were facing numerous data science challenges:
1. Data pipelines management
We were first lacking an adequate tool to manage data processing pipelines that would allow for ad-hoc data updating and processing with an optimal computing time.
Some of the data that we were manipulating was faulty (in part because of satellite transmissions shortening messages) and we were missing a tool to have a quick scan through the data, in order to elaborate the right approach to correct it.
We were missing a tool to automate the updating of our pipeline, especially accounting for specific triggers, but also allowing for dashboarding and reporting options.
2. Handling different formats and types of data
The data that we manipulate can be structured and unstructured, comes in various formats from different providers. As a consequence, being able to handle at the same time single .csv files, databases, retrieving the data from built-in and/or provider specific API was compulsory.
Along with the diversity of data that we manipulate in form, we also manipulate data of different nature (scientific measurements, geospatial information, natural language, financial data, etc.), which also calls for a versatile data science processing solution.
3. Lack of centralized platform
All the AI/Machine learning frameworks that we came across were not very user-friendly and required too much expertise to be promoted internally.
Finally, we were looking for a collaborative data science platform to allow for multiple users, with specific roles/rights/access.
Solution:
Thanks to Dataiku, we were able to address a big part of the challenges aforementioned. In the frame of the Ikigai program we were given access to both Dataiku as a platform at a company-wide level, but also to Dataiku staff and expert-knowledge to support the implementation of our data science projects.
1. Empower people across the organization to gain insights
Having access to Dataiku allowed us to ramp up our Data Science analysis. The user-oriented, code-minimalistic approach provided by the Dataiku pipeline was a game changer both for our data pre-processing and post-processing steps, the extensiveness of built-in operations to manipulate and prepare the data made it possible for less programming-savvy staff to perform their usually very time-consuming operations, as such they felt like using a steroid-powered Excel! The built-in GIT version controlling, and the logging of each individual operations allows for a readable and sustainable project approach.
The collaborative environment, and the overall user experience allowed for a company-wide adoption. As part of the Ikigai program for nonprofits, Dataiku provided support to enable users on the platform through trainings, projects co-development, and support. The Dataiku staff also contributed directly to develop some of the most innovative projects, including Emilie Stojanowski, Matthieu Scordia, Paul Hervet, Jacqueline Kuo among others.
2. Leverage & reuse data pipelines and features to save time
In regard to the product itself, Dataiku helped us better manage our data pipelines, so as to track what has been done and leverage accomplishments for future projects. We first started using Dataiku for the testing of our barriers in November 2018, then less than a year later we easily replicated the same data workflow for a new test campaign – leveraging these new efficiencies to spend more time developing features. In November 2020, during a campaign in the North Sea, our engineers only went through a quick Dataiku training to be able to reuse the previous data pipelines and features, so as to focus their time on where they can add most value.
3. Adapt to different data (in format & type) and use cases as we expand
Thanks to its great versatility, Dataiku enabled us to connect to many different data formats, APIs, plugins, etc. This is paramount as we are handling different data in nature as well, and the platform capabilities are key to adapt. For instance, visual pre-treatment features allow us to identify when satellite data is cut without needing to complete the entire preparation process, and filter this out – which saves much-needed time and resources for our nonprofit organization.
Impact:
As a Non-profit organization, our main Key Performance Indicator is the quality/time ratio of the tools that we are using. In other words, our biggest objective is to have the most reliable yet versatile data science platform to efficiently conduct data projects and create the biggest impact across the organization.
Dataiku enabled us to dramatically improve this KPI through different levers:
1. Improved operational efficiencies to focus resources on innovation
Before, we had to extract data from SQL databases, aggregate them, build interpolations, etc. with a combination of Excel and MatLab. Dataiku enabled us to centralize the whole workflow, while all practitioners are able to work with the technologies they’re used to – making us move faster and going further into the most innovative parts.
2. Gather everyone on the same platform for quicker decision-making
Thanks to the visual interface, Dataiku enables both technical and non-technical stakeholders to understand the data workflow and the success metrics of projects developed. This is enabling us to make quicker decisions, for instance regarding the success of specific campaigns, and make adjustments on the go to meet our goals.
3. Easy onboarding to bring in more people to better fit projects’ needs
The user-friendly interface makes it easy to onboard new people into the platform. The learning resources, as well as the catalog of events and content, give people a vast perspective on data science topics. Our core data science team is aided by 5 times more people across the organization who have been given access on a temporary basis to bring their expertise on various projects.
4. Visual “recipes” enable everyone to bring in their skills and shorten the time-to-insight
The visual features for data wrangling and visualization enable everyone to contribute their skills to successfully conduct data projects. Even the project managers and director are able to draw insights from the data at hand.
5. Democratize data science through a versatile, all-in-one platform
Building upon all levers described above, Dataiku’s biggest impact lies in the. democratization of data science across the organization. From the original technical testing project, we’ve expanded usage to finance (understanding fundraising dynamics to increase the impact of our campaigns) and communications (optimizing social media content and timing to maximize visibility, and therefore improve fundraising abilities).
Through bringing together everyone on the same platform and the rise of “citizen data science”, Dataiku enabled us to embed data science across the organization to create more value toward fulfilling our mission.