ABS - Commercial Vessels Churn Prediction for Sustaining Long-Term Revenue
Ivan Chernukha, Senior Machine Learning Engineer, with:
Luke Rouquette, Director, Analytics Innovation
Harold Mitchell, Manager, Machine Learning Operations
Amin Asgharzadeh, Senior Data Scientist
Gabriel Fonteles, Rotational Engineer
Qiong Wu, Senior Data Scientist
Country: United States
The American Bureau of Shipping (ABS) is an American maritime classification society established in 1862. ABS' core business is providing global classification services to the marine, offshore and gas industries. As of 2020, ABS was the second largest class society with a classed fleet of over 12,000 commercial vessels and offshore facilities.
ABS develops its standards and technical specifications, known collectively as the ABS Rules and Guides. These Rules form the basis for assessing the design and construction of new vessels and the integrity of existing vessels and marine structures.
Best Acceleration Use Case
Best ROI Story
In our industry, the windows for customer acquisition (particularly vessels) are notably constrained, presenting themselves primarily during two key events: the construction phase of vessels and the intricacies of class transfer to another classification society organization. This underscores the importance of customer retention, surpassing conventional business realms. Given the idiosyncrasies we encounter, predicting and managing customer (vessel) churn gains paramount importance.
Delving into the maritime domain, where vessels operate within a considerable timeframe of 20 to 30 years, the gravity of customer retention becomes even more pronounced. Hence, the primary strategic thrust is directed towards forestalling the departure of vessels from our portfolio. This proactive stance is not only instrumental in maintaining an unbroken stream of revenue but also serves as a linchpin for safeguarding long-term financial stability. Furthermore, a transfer of class can also impact the reputation of the classification society.
The first phase of the project included data exploration and availability where various datasets went through the Exploratory Data Analysis (EDA). The EDA was then followed by preprocessing the datasets, feature generation, normalization, join and merging of datasets, and addressing duplicates and missing values. Dataiku provided handy tools for quick visualization of feature dependencies and high flexibility for custom Python preprocessing of those features. Various built-in, python and structured query language (SQL) recipes were used to accomplish these steps.
The process was then followed by building a time-history dataset and performing some aggregation to prepare the train/test input for the machine learning model from Dataiku Lab where its built-in machine learning algorithms were used to train and ultimately obtain the best model.
Once the model comparison is done, the best fitted model was first deployed to predict the future churn vessels, then output the predictions to a database to be later accessed and exported to Power BI reports to ultimately be shown to the end user. At this stage Dataiku delivered extra value, as feature importance allowed to present top contributing factors in an assessment. A high-level diagram of the flow is presented below.
The tool helps the business development team to keep track of the high-risk vessels and potentially prevent the churn. It also now saves time to navigate through the data about each high-risk vessel as all supported data sources are integrated in one place at this Power BI application.
Business Area Enhanced: Analytics
Use Case Stage: In Production
The developed product has brought the vessel churn prediction to a new level and provided further opportunities for customer retention. As the cost of losing the vessels to a competitor is large for ABS, this project has helped prevent loss of revenue.
Account managers can now drill into details about every high-risk vessel of their fleet and quickly access vessel information and the contributing factors that define the churn. Various flags and metrics have been created to provide early warnings on a churn event. This project also helped save hundreds of hours investigating potential churn events manually.
Value Brought by Dataiku:
Dataiku is the central piece of the product including but not limited to the following:
1. Speed up the development process by allowing collaboration capabilities between data scientists and integrators of predictions results into Power BI. Our first set of results was produced within two months of the start of the project. This speed cycle resulted from contributions from three teams (i.e., data science, data engineering, and reporting teams) of experienced professionals. By our estimates, it could have taken up to four to five months with standard data science toolkit.
2. Native data connectors streamlined integration of data dispersed among several tables in our data lake house.
3. Enhanced modeling capabilities in Dataiku have helped us with short listing the models and speeding up feature selection, as we had hundreds of features that could help in the task coming from many data sources.
4. Optimal allocation of resources, scaling with Kubernetes, in-database engine capabilities (Spark and SQL) and streamline the process using metrics, checks and scenarios.
5. Dataiku ability to perform version control to revert the changes came handy during the development.