Unilever - Building Self-service NLP for Analysts Worldwide

ash Registered, Dataiku Frontrunner Awards 2021 Winner, Dataiku Frontrunner Awards 2021 Finalist, Dataiku Frontrunner Awards 2021 Participant Posts: 4 ✭✭✭✭

Linda Hoeberigs, Head of Data Science & AI, PDC Lab
Ash Tapia, Data Partnerships & Tools Stack Manager

United Kingdom


Everyday 2.5 billion people use a Unilever product to look good, feel good or get more out of life. Our purpose is to make sustainable living commonplace. We are home to some of the UK’s best- known brands like Persil, Dove and Marmite, plus some others that are on their way to becoming household favourites like Seventh Generation and Grom. We have always been at the front of media revolutions whether that be the 1st print advertisements in the 1890s or in 1955 when we became the 1st company to advertise on British TV screens. Experimentation and bravery drive us and have helped us become one of the UK’s most successful consumer goods companies.

Awards Categories:

  • AI Democratization & Inclusivity


Our Unilever People Data Centre (PDC) teams across the globe deal with vast amounts of unstructured text data to gain insight into our customers, how they engage with our brands and products, and what are the needs we are yet to tap into. The industry is moving at a rapid pace which consequently requires a rapid generation of insights to stay on top of the latest trends.

The sheer amounts of data and the skills required to analyse it efficiently exacerbate this problem. The answers our marketeers, product research and development, and supply chain specialists also require analytics approaches tailored to the business.

Analyzing text data is a complex task and often requires understanding complex language models and Natural Language Processing techniques, which most of our marketeers do not have. Their skills are focused on data analysis, so we had to find a way to synthesize our text data into something that can be analyzed by our PDC analysts, without compromising on our technical data science approach.

Building on this, the solution had to be flexible and able to work in multiple languages with the aim to supply all analysts a tool that would be accessible in their market.


This solution was born via the democratization of a project flow made up of several code recipes, as with most data science work, it is often unknown how applicable and reusable a piece of code is until its is put into practice. In this case, we were able to take these code recipes written by our data scientists and encapsulated them into a plugin by collaborating with our data engineers.

Using the ability to create custom plugins, we developed a plugin called Language Analyser which is readily available for use by anyone in the PDC across the globe. It has allowed hundreds of analysts to be able to apply Natural Language Processing (NLP), increasing the efficiency, quality, and granularity of their work.

What’s more, the ability to compare two text datasets was implemented using the ability to have multiple datasets as input to a single plugin, thus increasing the range of applications of this tool.

To solve the challenge of flexibility we employed a custom front-end, using HTML, CSS, and JavaScript. By being able to create a user-friendly interface we were able to break the barrier between technical terminology and algorithms, with analyst-friendly terms.

For an analyst to be able to use this plugin they merely needed to supply a dataset and select their pre-processing steps such as removing spammy authors from social media data, removing unnecessary stop words, and cleaning the data of noise. From this they can then choose which NLP techniques to apply to their data, including identifying general grammatical entities, emojis, and Unilever relevant terms such as ingredients and fragrances. Building from this, they can choose to enrich their analysis with pre-tagged sentiment, adding a layer of depth to generated insights such as which emojis are used in a positive context when discussing vegan foods.

Our data scientists are often focused on the accuracy and the processes behind the scenes to turn unstructured data into something more structured, our analysts on the other hand are focused on finding insights and presenting these back to their stakeholders.

Our solution makes use of static insights within Dataiku to create a way of visualizing the data returned from the pre-processing and data science processes. Being able to leverage such JavaScript libraries like D3 allowed us to collaborate with a dedicated design team to present the data in a way that aided information presentation and insight discovery.


The tool has been received extremely well by analysts and other data scientists. It sees strong usage every day across a wide span of research projects. The outputs serve both as inspiration for further analyses such as theme detection, and as discovery of language intricacies.

One of the key reasons this solution was implemented in a plugin was due to how it gave a single interface to multiple common options using NLP. This resulted in analysts being able to use the Language Analyser for data cleaning, tagging Unilever entities, or completing a full comparative language analysis on two datasets. It allows the analysts to see their text data in a new light in a matter of minutes. It goes without saying this is now our analysts’ go-to text analysis tool.

In addition, this tool has been up and running for more than a year and has changed the way in which Unilever informed marketing strategy for Hellmann’s, who found out which foods over-index for lunch compared to other meal-moments, and thus were able to generate more relatable meal moments in their campaigns. It has also informed Comfort on which words to use in tone-of-voice strategy by finding out which words over-index for millennials.

The current team continues to better the tool by integrating with other existing capabilities, for example topics and themes. Which adjectives and adverbs describe each theme the best? What beauty ingredients are most common for each topic? As we uncover more insights, our questions grow more advanced, and this requires a forward-thinking strategy.

In addition, as we expand globally, questions like this are starting to pour in all the way from Mexico to Japan – we have continuously worked to improve our language coverage with the tool gone from supporting 12 languages at the start to 30 languages currently. We design and develop with the analyst-in-mind and market coverage has been a significant milestone.

The Language Analyser has allowed data scientists, data engineers, and visualization experts to collaborate in a way that was previously siloed. It has paved the way for future projects with regards to how we think about what data science process are democratized into plugins for our global analysts to use.

At the end of the day, the Language Analyser has fundamentally changed how we view text analysis and visualization – it has opened the business to new ideas and possibilities across the globe.

Setup Info
      Help me…