One Acre Fund - Scaling Data Science Insights to Better Serve Smallholder Farmers
Name: Emiel Veersma
Title: Data Scientist
Organization: One Acre Fund
Description: One Acre Fund is a nonprofit organization that supplies smallholder farmers in East Africa with asset-based financing and agriculture training services to reduce hunger and poverty. Headquartered in Kakamega, Kenya, the organization works with farmers in rural villages throughout Kenya, Rwanda, Burundi, Tanzania, Uganda, Malawi, Nigeria, Zambia, Ethiopia, and India. One Acre Fund actively serves more than 1 million farmer families.
One Acre Fund offers smallholder farmers an asset-based loan that includes: 1) distribution of seeds and fertilizer; 2) financing for farm inputs; 3) training on agriculture techniques; and 4) market facilitation to maximize profits. Each service bundle is around US$80 in value and includes crop insurance to mitigate the risks of drought and disease.
To receive the One Acre Fund loan and training, farmers must join a village group that is supported by a local One Acre Fund field officer. Field officers meet regularly with the farmer groups to coordinate delivery of farm inputs, administer trainings, and to collect repayments.
One Acre Fund offers a flexible repayment system: farmers may pay back their loans in any increment at any time during the growing season. Beyond their core program model, One Acre Fund also offers smallholder farmers opportunities to purchase additional products and services on credit. These include solar lights and reusable sanitary pads.
Data Science for Good
AI Democratization & Inclusivity
Value at Scale
Operationalizing data science projects
The biggest challenge we were faced with at One Acre Fund was operationalizing our data science projects. Over the years, many clever data scientists came and went at our organization. The data scientists conducted impressive analyses, but the results were soon outdated and forgotten once they left. There was not one root problem that caused this, but there were many different challenges faced.
1. Coding makes reusability more difficult
The first challenge was that the data scientists were doing everything with code. It's hard to take over someone’s project when both the data, the model, and the steps were not visible. When the timelines of the data scientists would not overlap, taking over a project would be so challenging that the new data scientist would just start over again.
2. One-time insights through local computing
Furthermore, our data scientists were not used to working with servers, so the code would run locally on their computer. If you run code locally, you can’t interact with “live” systems, pushing data science to the back of the organization. Results were not taken into production, but just used as one-time insights.
3. No shared infrastructure for accessing data
Our final challenge was that we didn’t have an infrastructure set up to share our data. We were not used to interacting with databases, and thus our data would reside on our computers. When a project was finished, the deliverable was the report, but it would be hard to reproduce it.
Since Dataiku is a full stack data science platform, it helped us in so many ways:
1. Automation to facilitate workflow maintenance
Initially we were looking for a solution where we could schedule and run our Python and R code. We wanted it to integrate with Git, and run code in isolated environments. When we tried out Dataiku, we set up a project to predict client repayments. We had analysed this before, but it was a complex process, which took a lot of effort to maintain. With Dataiku, we could easily run our code, connect with our data warehouse and schedule the flow.
2. Optimize modeling thanks to model competition
Dataiku enabled us to try out different types of models and investigate the data. These features helped us more than we expected.
3. Visual interface to democratize data insights
In the next projects, we worked with less tech-savvy colleagues. Dataiku helped them to be able to use the click-and-play functionality to build complex ETL processes and store the data in the database.
This helps the organization to democratize our data analysis and to store the data in a central place. Because of the visual nature of the flows, we can easily work together and discuss the challenges that we face during the project. Seeing the datasets halfway through the flow enables us to easily understand what is going on in the data, and share it with different stakeholders. It’s obvious that visualizing the steps of a process reduces a lot of mistakes and it is something we couldn’t work without anymore.
1. Scaling our data science initiatives
Currently, we have created 70+ projects, of which 25 are in production. We maintain more than 1,000 datasets on 33 connections. We’re working with a small team and this would not have been possible without a platform such as Dataiku. And although our team is small, more and more colleagues are working on Dataiku and are able to perform their own advanced data science. We have 25 active users on Dataiku who work together on the platform on a daily basis, and this number is growing rapidly. Before, we wouldn’t be able to work together with such a big group.
2. Faster user onboarding & enablement
Dataiku saves us a lot of time. Last week, we introduced a Rwandan data analyst to the platform. We reproduced a project he had been working on and we were able to take it to production within an hour. This meant that he didn’t have to manually download the dataset anymore, thanks to the visual recipes he could run his code on the database and he could easily investigate his intermediate steps. Before Dataiku, the project took him 5 days to build and run.
3. Upskilling our team & stakeholders
It also allows us to use techniques which weren’t accessible to our data scientists before. For example, our farmers can now talk to a chatbot, to receive information about the weather. This chatbot talks to a Dataiku API endpoint, which accesses our stored forecasts. Without Dataiku, our data scientists wouldn’t be able to set up an API by themselves. The same can be said for the scheduling and the deployment of code to a server.
Overall Dataiku really helps us to become a data-driven organization and we couldn’t work without it anymore.