Davivienda - Using Survival Models to Accompany Customers Throughout Life's Milestones
Customer Analytic Models: Juan Esteban de la Calle Lizeth Roldán Jimenez Diego Alexander Maca García Juan David Jaramillo David Peinado
Davivienda is a regional bank leader that operates in six countries (Colombia, Panama, Costa Rica, El Salvador, Honduras, and the United States). With over 49 years of experience in the Colombian market, Davivienda offers a wide range of financial services to individuals, SMEs, and corporate customers. The bank currently serves over 20,9 million customers through a network of 665 branches and more than 2.692 ATMs.
Value at Scale
Excellence in Research
Davivienda is a regional bank leader that operates in six countries (Colombia, Panama, Costa Rica, El Salvador, Honduras, and the United States). With over 49 years of experience in the Colombian market, Davivienda offers a wide range of financial services to individuals, SMEs, and corporate customers.
As a financial institution, Davivienda is always looking to offer the best value for our customers at the correct time with the best channel. This includes the possibility of giving our customers an offer that matches their different moments in life. This means applying historical knowledge about our customers' behavior with state-of-the-art methodologies known as Survival Analysis. This model allows us to answer the question of when a mortgage, credit card, or car loan offer should be made to our customers.
There are two remarkable facts to be taken from the project. The first one is that the model proposed answers the question of "when" instead of "whether." This recalls that the offer must be adjusted to match the customer's specific needs at that particular time in their lives.
The other fact is which products these models are designed for: offers for credit cards, mortgages, and car loans. As a developing country, it is common for young people from Colombia to use low-limit credit cards to form a credit risk score for the first time. What these three products have in common is that they are linked to specific milestones in a person's life: someone's first credit card, first mortgage, or first car.
For us, creating survival models with millions of rows, taking into account several rows (months) per customer, was impossible until we had Dataiku.
Commonly, the loan offers made by a financial institution are selected using classification techniques such as logistic regression, random forests, or XGBoost. In these models, the answers are whether someone will choose a product or not. Classification models are contingent on datasets that reflect only a small portion of the customer's story.
Imagine having a video and only being able to use some frames to describe it. The same simile could be used to describe the possibilities of this model. When using classification, the availability of the data is as limited as having only a frame. However, when a survival model is used, all the customer's historical data is now available.
The ability of the model to relate the moment when the event of getting a mortgage, credit card, or car loan and the specifics of the variables at that time is a powerful tool for the cases.
Dataiku enabled us to use pyspark to gather the information in a format suitable for the model. This would not be possible without Dataiku's capabilities to deal with queries for big data. It allowed us to create a dataset grouped by customer and month and hundreds of variables which helped model the events with Survival Analysis.
After the data processing, a pipeline was created to connect the dataset ready to model with a Cox regression created in R. Without Dataiku, it would not be possible to pipeline these two processes together.
Another advantage of using Dataiku was the governance of the model. Seven people from different areas were involved in the modeling, curation, and deployment of it. Dataiku even allowed us to ease the creation of copies of the first model (mortgage), which let us create the second (car loans), and third (credit card) models within a few clicks.
We measured the value obtained with an experiment.
We decided that a random set of possible mortgage users decided by the model (the top n most probable) was to be compared with a group based on business rules, as well as a random control group in actual campaigns. This showed that the suggestions made by the model were 5% more likely to be accepted than the group based on business rules, and 45% more likely compared to the random control group.
The difference was even bigger when measured against mortgages, credit cards, or car loans opened at other banks in the country. These measures reaffirm the model because it considers the opening of a mortgage, loan, or card in any bank as an event (positive case).
What's more, the model was completely automated, reducing the time until we have a new prediction to only four hours when the raw source datasets are ready.
Value Brought by Dataiku:
Dataiku allowed us to automate the model through scenarios and create a pipeline to directly connect intermediate tables created with pyspark with the survival library from R.
We were also able to divide the work into three squads: one in charge of creating a dataset with the events which came from three different sources of information (external and two internals); one in charge of preparing the modeling until we had the information grouped by customer and date ready for modeling; and one in charge of the modeling itself. Everything was coordinated by simple communication through chat and comments in the Dataiku recipes.
The flexibility Dataiku has lets us try different packages (some in Python, some in R) to see which one would be more comfortable, usable, and able to provide the best technical solution to the problem.
Dataiku converted a model which is usually studied only in academic environments with small controlled datasets, such as Cox regression with time covariates, into a scalable use case deployed for millions of users monthly, which triggers product offers to our customers.
The deployment and curation of the models were also sped up for this process, making it easy to check steps at the same time the rest was being processed.