Aviva - Machine Learning Approach for Log Analytics

Name:

Mitesh Chandorkar with 

Aviva Team - Simon Sinfield (Manager), Robert Tullis, David Pearce

Wipro Team - Richardson Jebasundar, Narinder Saini, Rishi Khowala 

Title: Data Science Architect

Country: United Kingdom

Organization: Aviva

Aviva plc is a British multinational insurance company headquartered in London, England. It has customers across its core markets of the United Kingdom, Ireland, and Canada. In the United Kingdom, Aviva is the largest general insurer and a leading life and pensions provider. Aviva is also the second largest general insurer in Canada.

Awards Categories:

  • Most Impactful Transformation Story
  • Partner Acceleration

 

Business Challenge:

Large organizations with digital capabilities have significant IT operational challenges, dealing with huge IT assets, numerous processes, and multiple support teams. Businesses are evolving at a rapid pace, and there is a boom in digital and IT estates to support them. Today's organizations are faced with keeping up with an ever-expanding IT estate, massive amounts of data, customer demands, and process inefficiencies.

The Digital Operations Management Engine was set up to address these IT operational challenges and help streamline the existing processes. The historic incident management data and log details are analysed to understand the current IT operations and challenges. Wipro’s AI Solutions leverages its industry and business-focused solutions to create a seamless connection throughout the IT enterprise value chain.

Wipro partnered with Aviva on this journey with the goals of:

Enhancing incident management to decrease the average time to recovery.

Improving Service Availability through proactive action on the assets as required.

Acting ahead of major disruption of services by being more proactive in identifying IT problems through logs.

Increasing operational efficiency by identifying the root cause.

Enhancing customer experience by identifying unreported customer incidents.

Reducing repetitive manual decision-making tasks.

 

Business Solution:

Dataiku is the backbone of this solution. Using Dataiku, we produced numerous use cases, many of which are functional and in production, while others are being worked on as proofs of concept. Key use cases include:

Incident Classification (Supervised) – Leveraging NLP, the classifier helped classify the incoming incident to possible root cause with 97% accuracy.

Anomaly Detection Using Log Data (Unsupervised) – It is difficult to manually find abnormalities in an estate of more than 300+ bots created for automation. The solution fetches stats data from logs every 15 minutes and alerts the support team about anomalies.

B_Anomaly_Graph.PNG

 

Prediction of Cluster Failure (Semi-Supervised) – The model, which was trained using 23 different infrastructure metrics, was able to foresee server cluster failures well in advance, by around two to three hours.

W_Anomaly_Graph.PNG

 

Event Sequence Modelling (Unsupervised) – The LSTM-based deep learning model identifies the normal pattern and predict anomalies from ingested log data. This is a proved concept, and we are planning to scale this solution.

NLP preparation and processing required for operational analytics was done on Dataiku (DSS v 9.x) using built-in recipes and custom python code. Dataiku enabled us to:

Ingest data from various sources – Use of MySQL, PostgreSQL, Log data via API to ingest incident and log data for processing.

Low code/no code data processing – The existing visual recipes aided in completing data transformations quickly. For complex transformations, custom Python recipes are used.

Inbuilt machine learning algorithms - Multiple models were trained with very little configuration and several machine learning and deep earning models were developed.

Data exploration made easy - Charts, dashboards, and statistics helped with quick data exploration.

Orchestration layer end-to-end solution – Job scheduling, and scenarios were used as an orchestration layer to decide the data pipelines flow.

 

Business Area: IT/Cybersecurity/Data

Use Case Stage: In Production

 

Value Generated:

The engine enables the support staff to consider why any given issue keeps occurring and fix it permanently.

The solution increased operational and resource efficiency through root cause labelling for the incoming incident. It provided the benefit of circa 200 hours in effort per month (Resource Efficiency) and circa 700 hours MTTR (Mean Time to Resolution) savings per month (Operational Efficiency).

What’s more, the solution enhanced customer experience by providing near real-time alerts to the support team in case of anomalies from the log data of Bots. This helped in taking a proactive approach to monitor and fix the estate as necessary.

By identifying the cluster failure, the run team could take preventative measures that increased service availability, decreased incident rates, and reduced associated work.

When a cluster of abnormal sequences is noticed in a short period of time, an event sequence model helps in identifying the customer journey and flagging potentially detrimental incidents in advance. Additionally, it can help identify unreported incidents (not raised by users) and get them fixed, thus improving customer experience.

Several other NLP clustering use cases still in development are assisting different IT teams in providing MI analytics reports with additional dimension(s) based on the clustering.

 

Value Brought by Dataiku:

Dataiku is the platform for Enterprise AI, systemizing the use of data for exceptional business results. It improved speed and agility through increased team efficiency, enhanced tech stack efficiency, improved risk management and governance through transparency and explainability, upskilling, and networking with resources such as the Dataiku Academy and Dataiku Community.

Key elements that generated value include:

● Dataiku’s MLOps capability, which helped in quickly getting the working model in time.

● Dataiku role in improving team productivity and speed. The visual recipes aided in the rapid construction of the ETL procedures, and model building frequently took only a few hours when built-in algorithms were employed.

● The orchestration layer in the form of scenarios, which has been good for us in building end-to-end solutions with less dependency on other tools for orchestration.

● The new plugins and features that are constantly added, which are very useful in PoC if we need some quick wins.

● The Dataiku Knowledge Base and the aforementioned Dataiku Academy, which are excellent sources of learning. They enabled the teams to self-learn and contribute to projects within a few weeks due to the excellent sources of information being available in one place.

 

Value Type:

  • Improve customer satisfaction
  • Save time
  • Other - Operational Efficiency Improvements

Value Range: Unknown

Comments
rohits
Level 1

Very good example of Wipro Dataiku collaboration and benefits brought to customer

Lakkala_Rakesh
Level 1

Good job on leveraging NLP for incident classification.

ajaywish777
Level 1

Impeccable use of this platform I would say.

rahuljot
Level 1

Great example of solving Log Analysis problem using AI through Dataiku platform. Solution will definitely add lot of value for customer. 

srikanthk
Level 1

Excellent demonstration of displaying the "art of possible" using AI-ML.  I am sure there is lot more to gain by extending it further as logical phase-2.

Manishbhushanuk
Level 1

Very good use of capability to solve real issues and create value, reduce toil. 

debmalyabiswas
Level 1

Great work in pushing the state of the art.

Actual implementations of Multivariate analysis still remain very rare in ITOps/AIOps scenarios, so particularly pleased to see this.

Prediction of Cluster Failure (Semi-Supervised) – The model, which was trained using 23 different infrastructure metrics, was able to foresee server cluster failures well in advance, by around two to three hours.

 

krichougule
Level 1

Awesome! Solution will definitely add lot of value for customer.

tripatsa
Level 1

Taking up an industry wide challenge and showcasing the capabilities of AI to resolve it seems to be a marriage made in heaven. 

Bhajandeep
Level 1

solution demonstrate how to leverage end to end connected workflow from data ingestion ->Machine Learning->Visualization.  This type of connected workflow is the need of any use case across domains. Appreciate team's effort to put it across.

anindito
Level 1

Excellent proof of value utilizing the power of Dataiku and smart application of data science to solve real operational problem. 

alahiri
Level 1

Good implementation of Dataiku for AI Ops

Vaikunda
Level 1

Nice use case to illustrate the business case.

Share:
Version history
Publication date:
05-09-2022 08:52 AM
Version history
Last update:
‎09-05-2022 10:52 AM
Updated by:
Contributors