Davivienda - Building a Survival Model to Predict the Probability of Employee Attrition

pjones Registered, Frontrunner 2022 Participant Posts: 1 ✭✭✭✭


Paul Jones Segura
Juan Esteban de la Calle Echeverri

Country: Colombia

Organization: Davivienda

Davivienda is a Colombian financial company with over four decades of experience in the financial sector. It has a presence in six countries including Colombia, Panama, Costa Rica, Honduras, El Salvador, and the United States. With over 17,300 employees and more than 19.3 million clients.

Awards Categories:

  • Data Science for Good
  • Responsible AI
  • Most Impactful Transformation Story

Business Challenge:

Employee attrition is a big problem for many companies in the banking industry. The cost of replacing valuable employees can be very large. Furthermore, the attrition rate can vary depending on business departments, with some having attrition rates over 10%.

Finding the data for attrition was an issue since the information was siloed in different data systems. As a result, the first step was gathering all the information required to generate a single view of employee attrition at the company.

Business Solution:

Our project was focused on building a survival model to predict the probability of attrition of each employee within our organization. For the sake of scalability and automation, we wanted a tool to integrate all steps from connecting data sources to building, testing, and deploying machine learning models and their predictions and finally generate a dashboard to show the results to end users.

With Dataiku DSS, we could integrate all the stages of our project. The data process started retrieving all necessary sources from our data lake, including data from HR and client data, since, as a financial institution, we are not just employees but also clients that interact with the organization in many different ways.

Afterward, we do all the data processing using both visual and code recipes for specific tasks. In the next stage, we built and tested different survival models to determine those employees with a greater likelihood of attrition. It was also necessary to estimate these probabilities and the reasons behind them, so how much each variable explained these probabilities for all employees.

Finally, we presented the results of this project in a dashboardbuiltd in Power BI for the final users.

Value Generated:

This project allows HR users to understand employee attrition better and predict those employees with a greater probability of leaving the organization within the next few months. Moreover, the project makes the process easy and fast since the solution is fully automated.

The main value coming from this project was the realization that it is possible to apply data analytics solutions to Human Resources challenges. It is also a useful tool for reducing employee attrition at the company.

Value Brought by Dataiku:

Dataiku allows us to automate the data processing, from connecting to the data sources from the data lake, to building and deploying predictive models to determine the attrition probabilities for each of the employees within the organization. We run the process monthly when new data is available or ad hoc, when needed.

Since most of our data sources live in a Hadoop ecosystem, we needed a system that could easily integrate with our data lake, but also give us access to the most relevant data analysis tools. Dataiku is that solution. It connects with our data lake to retrieve all the information necessary to build our project. Moreover, we could make all the data preprocessing and enrichment codeless with its visual recipes, and use code recipes for specific processing tasks.

In addition, since our project was based on survival analysis, we needed to work with some specific R libraries for this statistical methodology and its applications for big data. Dataiku is a solution that offers integration with R packages and especially with SparkR for processing huge amounts of data.

Finally, the flow design allows HR users to clearly understand all the data processes from data sources, preprocessing, building, and testing survival models, and put the final results for each employee within a table with a connection to a dashboard in Power BI for visualizing the results.

Setup Info
      Help me…