The Ocean Cleanup - Creating the World's Largest Beach Cleanup Database to Optimize Positive Impact
Adrien Guénard, Computational Modeler Yannick Pham, , Computational Modeler Sarah-Jeanne Royer, Oceanographer Bruno Sainte-Rose, Lead Computational Modeler Jacqueline Kuo, Data Scientist, Dataiku Cassandra Chuljian, Data Scientist, Dataiku Gonzalo Betegon, Data Scientist, Dataiku
Organization: The Ocean Cleanup
Every year, millions of tons of plastic enter the oceans primarily from rivers. The plastic afloat across the oceans – legacy plastic – isn’t going away by itself. Therefore, solving ocean plastic pollution requires a combination of stemming the inflow and cleaning up what has already accumulated. The Ocean Cleanup, a non-profit organization, designs and develops advanced technologies to rid the world’s oceans of plastic by means of ocean cleanup systems and river interception solutions.
Data Science for Good
Plastic pollution appears to be a major threat for various aquatic environments. Each year, more than 5 million tons of plastic enter the ocean, shedding light on the urgency to act. The Ocean Cleanup’s scope is to prevent the plastic from entering the ocean by intercepting it in rivers, and to clean the legacy plastic pollution by removing the plastic already accumulated offshore.
Knowing that 97 % of the floating plastic entering the oceans ends up onshore, various organizations are also focusing on beaches to conduct cleanup activities. Beach cleanups can be performed at different scales, from global and methodical campaigns realized by big organizations to more local, volunteer-driven cleanups.
Beach cleanup data are present in large amount, however no centralized database exists. Each organization has its own way of reporting data and the details of the reported data often depend on the size of the organization and of the beach cleanup activities.
The first aim of this project was to collaborate with different associations to build the world largest beach cleanup database. This database aimed at gathering all beach cleanup data available under a standardized format enabling a global use. Analysis of this unique beach cleanup database also represents a major opportunity to better understand how plastic ends up on beaches, leading to an improvement of different plastic cleaning strategies.
In total, 5 users were involved on the project in Dataiku yet several volunteers had to contact over 20 organizations to gather the data. The project has spanned 18 months since its inception. Among the participants to this project 3 staff members of DSS helped us thanks to the Ikig.Ai program.
The longest phase of the project was data cleaning, as data needed to be reviewed, organized, and uniformized to be included in the final dataset. 306 225 rows were lacking country information and needed to be fixed. Geocoding was not the only data cleaning activity as almost 100,000 rows of data required a weight computation based on plastic item description.
Dataiku enabled us to be much faster during this step by providing fast data visualization tools, which provided an easy and immediate way to check the quality of the data. After cleaning the data, geospatial analysis was performed to identify where important loads of plastics had been cleaned. Study of the plastic collected on these areas, called hotspots, enable to enhance the scientific knowledge about plastic transport in aquatic environment.
Dataiku visualization features were a main asset to better understand the geographical distribution of the hotspot and to build a global beach cleanup map.
Business Area Enhanced: Analytics
Use Case Stage: In Progress
The dataset built throughout this project gathered data from 20 organizations over a time range of 10 years. Cleanups have been performed in more than 100 countries. The final dataset counts 915,000 points making it the world’s largest beach cleanup database. This first result is a major success as the dataset will benefit the community.
First, this dataset will serve to increase global knowledge, as it is awaiting publication in a scientific review. But it will also immediately bring practical benefits, by providing a better understanding of shore cleaning process to optimize the actions of organizations around the world. This will also start a virtuous circle, as these organizations will be incentivized to further contribute to data enrichment.
We will also conduct further analysis – e.g. spot the most at-risk areas, compare beach cleaning vs. river cleaning… to enable us and fellow nonprofit organizations to increase our impact on removing plastic pollution.
We’re also looking to build a citizen science application, where anyone can enter data on the amount of plastic removed from the nature, to obtain acquire a more accurate picture of all the plastic cleaned worldwide.
Value Brought by Dataiku:
Interactive visualizations provided by Dataiku has enabled a major time saving during the project.
Through Ikig.ai initiative, Dataikers accompanied us and dedicated some of their time to the project. The data pipeline created in Dataiku enabled to propagate very quickly the analysis after the addition of new data. 5 Dataikers were involved on this project on a weekly basis ranging from 5 to 20 hours.
Thanks to the support of experienced Dataikers, new users were able to product some results 1-2 weeks after their start. Dataiku has enabled to centralize the different dataset but also the coding recipes. Data visualization tools appeared to be useful in the framework of geospatial data reviews, other solutions based on GitHub shared code would have been much to develop.