We're excited to announce that we're launching the second installment of Dataiku Product Days Register Now

Schlumberger - Streamlining & Augmenting the Well Evaluation Process at Scale

Rasesh Saraiya

Data Scientist

Saudi Arabia


Schlumberger is a technology company that partners with customers to access energy. Our people, representing over 160 nationalities, are providing leading digital solutions and deploying innovative technologies to enable performance and sustainability for the global energy industry.

Awards Categories:

  • Organizational Transformation
  • AI Democratization & Inclusivity
  • Value at Scale
  • Alan Tuning


To inform Schlumberger’s drilling operations in one of our locations, our team (Digital & Integration) collects a lot of unstructured data from the field in the form of reports that contains a wealth of information that was not tapped into before.

Our customers receive a large volume of reports - hence it is impossible to manually go through all of them to extract insights. Therefore, we needed a way to process unstructured data at scale.

The main objective of this project is to facilitate Offset Well Analysis (OWA), which is a process every drilling engineer must complete before a new well is planned. It includes looking at all wells in the vicinity to identify any potential issues.

This used to be a highly manual process, in which the engineers had to get a list of all the wells close-by, find a specific folder on a hard drive, access the relevant reports for each well, and read through each and every one of them to manually extract information on issues.

Since our activity across various locations is expanding, we need a more efficient process to automatically find out which wells were to be investigated and streamline the overall process to directly point users to the occurrences of known issues.


We’ve built customized pipelines to ingest these reports, parse them, carry out the Extract, Transform, and Load (ETL) process, consume them in a machine learning application, and make the insights available across the organization via a custom application. Here is the step-by-step process:

1. Setting up an ETL pipeline to extract meaningful data

We created a Dataiku project to accept these reports and store them on a Network File System (NFS) drive in Dataiku. Data representing 1,000s to 100,000s of documents are available to process. We built an ETL pipeline to parse this information, extract meaningful data, and structure it. A specific schema was applied in order to transform unstructured files for ingestion into a structured data source - in our case, MongoDB.

2. Gaining insights through Natural Language Processing (NLP)

We now were able to conduct NLP on these files, including regular expression-based queries.

These reports are consumed by various teams in the organization, who need to draw insights from them - for instance drilling teams looking for specific keywords such as “losses” or “stuck pipes”. With the data centralized in MongoDB, we were able to write a simple query to retrieve the information we needed.

3. Building a language model for automated query expansion

But these reports are written in the field, hence there are many spelling/contextual mistakes which put a limit to our capacity to derive insights. We were therefore looking to build a more powerful model.

We took the structured data parsed in MongoDB and used it to train a language model, using unsupervised techniques. As an output, the model associates each word with a vector. This enables us to do automated query expansion using a Python recipe in Dataiku - for instance, associating “bitballing” and “spersene”, or “camel” and “snake”.

This gives us insights that we never thought were possible earlier. With an ability to query around 6 to 8 million documents in a matter of milliseconds - and not only based on designated keywords, but with expanded keywords to counter spelling mistakes and gain insights from all the data at our disposal.

Now that this machine learning is in place, how do we operationalize it, so that consumers can benefit from its insights? We wanted to deploy it as a model, which would take in a sentence and give an output associating predefined keywords with the sentences entered.

4. Operationalizing the language model to score past data

We began an intense search with the users in order to understand how this model could help them. We came up with five keywords linking to incidents which most people are looking for in the report.

We built an application to filter out the existing data, analyze all the comments from these reports, and showcase only the top 3-4 instances where the most-searched keywords were to be found.

After validating the wireframe with users, we deployed it as an API with a very simple user interface. It would accept one of these sentences as the input, already knowing which of the five words it had to score the sentence on, based on references of the exact words or related concepts. The output consists of 5 words, each representing the possibility of these events having taken place in that sentence.

We proceeded to use this model to score all the sentences in the past, which enabled us to select a certain well, look at all the sentences associated with that well throughout its lifetime, and score them.

Now, how do we wrap this API around in the form of an app, so as to make it more widely available?

5. Building a custom application for users to access & visualize insights

We recently developed a custom application using a .NET backend and an Angular frontend. This was a custom app built to communicate with the same database that Dataiku had written the outputs to (a MongoDB collection), and we had the API ready and deployed.

The Angular frontend components were very simple, with the main goal of condensing the information per well per section and showing the highest probability of these events.

The visual output consisted of a color-coded table showing all wells and the known issues identified in various components. The clarity and simplicity of this representation makes it invaluable for users to identify and dig into the most important issues that need to be solved.


This project brought tremendous value to drilling planning operations, and was most notably impactful around three main areas:

1. Time savings in the well evaluation cycle & risk analysis

Our drilling engineers are now able to conduct well evaluation cycles in a matter of seconds, with just a few clicks - rather than spending hours or even a few days reading through all reports.

The Angular app shows a scored value of the risk, but also includes radius-based search which sends API calls to the model deployed, and returns wells located only in the vicinity of that which is planned. This is tremendously helpful to gain time in the analysis.

2. Reducing the risk of human error

Since reports are written directly in the field, chances are high that some of it is misspelled or otherwise not fully accurate. In the same manner, at the other end of the process, drilling engineers might miss a key piece of information if they spend days reading through reports.

As all the data is now analyzed by the system, which shows only the most relevant data points to end users, this minimizes the risks related to bad data or misinformation at both ends of the process.

3. Driving data democratization across the organization

Data used to be scattered in different places, and Dataiku provides a central interface to consolidate it into one database. The platform also centralizes insights into one single API point, so that any user in Schlumberger can access them at any time.

Dataiku has brought data closer to domain users, but also adding value through giving them a platform to gain insights from their data. This not only applies to several hundreds of well planners, but to thousands of people across the organization who are empowered by accessing this invaluable wealth of information.

Version history
Last update:
‎08-03-2021 06:30 PM
Updated by: