Leidos, Inc. - Staffing Execution Evaluation and Prediction Capability Using Novel Combinations of S

chengke Partner, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2022, Frontrunner 2022 Finalist, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant Posts: 2 Neuron


Karen Elizabeth Cheng, Principal Investigator, Data Scientist
Virginia Cunningham, Data Scientist
Michael Kutsor, Marine Corps Client Account Manager

Country: United States

Organization: Leidos, Inc.

Leidos, formerly known as Science Applications International Corporation (SAIC), is an American defense, aviation, information technology (Lockheed Martin IS&GS), and biomedical research company headquartered in Reston, Virginia, that provides scientific, engineering, systems integration, and technical services. The Leidos Innovations Center (LInC) rapidly prototypes and field solutions in areas such as Artificial Intelligence/Machine Learning, big data, cyber, surveillance systems, autonomy, sensors, applied biology, and directed energy.

Awards Categories:

  • Moonshot Pioneer(s)
  • Excellence in Research
  • Most Extraordinary AI Maker(s)
  • Partner Acceleration

Business Challenge:

The bridge from responding to the COVID crisis to thriving in a new normal post-COVID pandemic world will be the foundation of success for the Marine Corps and Leidos. Like all companies, Leidos is facing a long-term “war for talent.” This term refers to the increasingly fierce competition to attract and retain employees at a time when too few workers are available to replace the baby boomers now departing the workforce or millennials looking for full-time remote working opportunities.

Our project is a collaboration from Leidos’s AI/ML Accelerator with the Marine Corps Enterprise Network (MCEN) Group. The primary focus of our initiative is to evaluate and sustain a viable post-pandemic contractor workforce in support of the MCEN cyber mission throughout the globe.

As a result, Leidos is utilizing Dataiku to assist with talent management by determining post-pandemic attrition trends, performing comparative analysis between IT companies across regions, and highlighting key factors to attract and retain qualified cyber support talent in the new normal post-pandemic workforce. Leidos is additionally required to maintain a very high level of effort (LOE) service requirement.

Therefore, Leidos is deploying a combination of statistical and machine-learning analytics to determine the most relevant data drivers for staffing analytics as well as predict staffing requirements (e.g., attrition) and performance. This effort required a significant amount of data integration, the need to track skill trends, and continuously present the findings of the analytics MCEN leadership can use to identify areas of concern for resource managers to maintain Leidos’ required level of effort.

This program involves several key novel technical research areas, such as determining the best methods for key driver analysis using simulations and scoring algorithms, predicting attrition, taking into account environmental factors, and identifying the metrics that combine job fulfillment and resource retention.

Business Solution:

This project involved numerous datasets from multiple vendors that track both job positions and people. We used Dataiku heavily for rapid data integration, creating repeatable pipelines and workflow analysis, data acquisition and storage, data visualization and analysis, and web-based project deployment.


Dataiku’s rapid visualization of the raw and processed data allowed us to gain a quick understanding of the data distributions and data integrity. Dataiku’s data analysis features allowed us to identify missing data, invalid data, and outliers. This capability allowed us to find data quality issues from the outset that we were easily able to fix using a combination of built-in recipes and custom code.

MCENFlow-03 (1).jpg

As we had two developers on this effort, Dataiku’s collaboration features helped us to understand each other’s workflows quickly, share code, and to explore results.

Since there were several areas where we wanted to perform model bakeoffs, we created some new projects specifically to bakeoff algorithm candidates. To understand the algorithm results, we simulated cases where the desired output is known and provided custom scoring routines.


Dataiku’s pipelines helped us to keep all of the models organized and to view and deploy the results in a shareable dashboard. We also found some similar experiments on simulated data by other researchers and also incorporated their data test generation cases into ours. It was convenient in Dataiku to integrate multiple programming languages such as R and Python.


We met with the Dataiku team routinely to learn all the ways that we can leverage Dataiku for our needs. The use of Dataiku allowed us to deploy an informative initial dashboard within the first several weeks.


Business Area: Human Resources

Use Case Stage: Built & Functional

Value Generated:

It is imperative that Leidos has post-pandemic alternative analytics tools that will allow hiring and resource managers the ability to retain employees with a higher certainty of sustaining the Level of Effort (LOE). We have not put this tool into production yet in an integrated system; however, all operations and analytic outputs are used today to give hiring and resource managers the ability to hire and retain employees with a higher certainty of sustaining the Level of Effort (LOE) threshold. This tool gives our program division a higher probability of meeting award fees greater than $1 million each year and retaining a strong and upskilled workforce.

Additionally, findings from our AI/ML Accelerator’s technical research of the best analytic methods for various data cases (e.g., continuous versus categorical) are applicable to numerous projects. We are in the progress of publishing our findings in research journals.

Value Brought by Dataiku:

Dataiku had a great impact on numerous aspects of this project throughout the entire pipeline, particularly in aiding our ability to hit the ground running. We were able to manipulate, combine, and store data in days, as opposed to months.

Furthermore, we were able to achieve the entire project with just two data scientists as opposed to needing to involve infrastructure personnel. Within the first few days of the project, we were able to find data entry errors and rapidly fix them. We relied heavily on the “analyze” feature of a dataset to quickly find data mistakes.

Not having to invest time in database setup and file system organization allowed us to focus on our core research interests that will address our machine learning challenges. By taking advantage of Dataiku’s web deployment capabilities, we saved a significant amount of time by avoiding the need to set up additional infrastructure such as web and database servers and were able to complete the entire effort with just a data science team.

Dataiku provided us with a framework for teamwork contribution, and process step readability and maintainability. While this benefit is often overlooked, the longer-term impact on an organization is invaluable.

Once we provided our custom scoring method to rate multiple algorithms, we used Dataiku’s dashboard capabilities to quickly view and compare the results for our ultimate model selection. This allowed us to deploy and communicate our results with the internal MCEN Program Management team and other stakeholders quickly.

Value Type:

  • Improve customer/employee satisfaction
  • Increase revenue
  • Reduce cost
  • Save time
  • Increase trust
  • Provide actionable data for internal process improvement
  • Help accelerate algorithm exploration, criteria-based selection, refinement, and code maintenance

Value Range: Project $ 10 millions over the next seven years

Setup Info
      Help me…