Unilever - Designing a Responsible, Self-service Tool for Natural Language Processing

ash Registered, Dataiku Frontrunner Awards 2021 Winner, Dataiku Frontrunner Awards 2021 Finalist, Dataiku Frontrunner Awards 2021 Participant Posts: 4 ✭✭✭✭

Ash Tapia (Linda Hoeberigs, Head of Data Science and AI, PDC Lab | CMI People Data Centre)

Data Partnerships & Tools Stack Manager

United Kingdom


Everyday 2.5 billion people use a Unilever product to look good, feel good or get more out of life. Our purpose is to make sustainable living commonplace. We are home to some of the UK’s best- known brands like Persil, Dove and Marmite, plus some others that are on their way to becoming household favourites like Seventh Generation and Grom. We have always been at the front of media revolutions whether that be the 1st print advertisements in the 1890s or in 1955 when we became the 1st company to advertise on British TV screens. Experimentation and bravery drive us and have helped us become one of the UK’s most successful consumer goods companies.

Awards Categories:

  • Responsible AI


Our Unilever People Data Centre (PDC) teams across the globe deal with vast amounts of unstructured text data on a daily basis to gain insight into our customers, how they engage with our brands and products, and what are the needs that we are yet to tap into. The industry is moving at a rapid pace which consequently requires a rapid generation of insights to stay on top of the latest trends.

The sheer amounts of data and the skills required to analyse it efficiently exacerbate this problem. The answers our marketeers, product research and development, and supply chain specialists need also require analytics approaches tailored to the business.

Analyzing text data is a complex task and often requires understanding complex language models and Natural Language Processing techniques, which most of our marketeers do not have. To help with this, our data scientists and software engineers in PDC have built a range of NLP methodologies and plugins, with the most complex being the Language Analyser.

The Language Analyser uses pre-trained language models for Part-of-Speech tagging, Named Entity Recognition, string matching based on existing entities relevant to Unilever, and visualises a range of insights in an interactive dashboard in the shape of network graphs, word clouds and sentiment scale bubble charts amongst others.

Responsible AI is one of the fundamentals to ensure our business is responsible, ethical, and sustainable, and this key across all business areas. We set out to understand whether Language Analyser, one of the most used plugins by analysts that employs NLP methods, is ethical. Finally, we also set out to understand whether the way it is used within Dataiku by the analysts is ethical.


To assess how ethical is Language Analyser and the way it’s used as part of our Dataiku ecosystem, we engaged with Adriano Koshiyama, a Research Fellow in Computer Science at University College London (UCL) and a co-founder of Holistic AI, a start-up focused on auditing and providing assurance of AI systems. Adriano has been working as a data scientist for many years across many industries such as Retail, Finance, Recruitment, and R&D companies.

All aspects of the plugin were assessed – its internal components, the wider environment of where it’s sitting, and the kinds of datasets analysts pull through the plugin. Since the plugin is available within Dataiku, we can easily assess how and where people use it. Dataiku and its collaborative, open environment has enabled full transparency on how the plugin is used across different research projects. We are able to monitor usage and assess its applications.

Holistic AI assessed our capability on privacy, fairness, robustness, and explainability using the following assessment framework:

Assessment framework.png

Thanks to the use of Dataiku, we were able to clearly outline each step of our development process, as both current and historical versions had been stored in the flow and using the timeline versioning.

We were also able to share how, when and by whom the plugin was used, thanks to the usage stats available on a dashboard we created in Dataiku. Furthermore, it was extremely clear where the data came from thanks to the end-to-end visibility of each flow in the project. All of this meant that we were able to provide white-box access levels, and could be judged on each of the following dimensions:



The Language Analyser plugin received the green stamp of approval from the AI auditing start-up. After having done a full assessment of one our most successful plugins utilizing the fair, responsible, ethical and unbiased, the analysts and data scientists can now be assured that the tool they use as part of their work fits within our responsible business practices.

The plugin passed the assessment (more details) with flying colors on each dimension, thanks to it being fully transparent in Dataiku, and was the first capability in all of Unilever to do so. More widely, we can assure the business that Dataiku supports our teams in ensuring longevity and continued transparency of our capabilities.

Additionally, we have full control visibility over what we develop, how we develop it and what components we bring together to design a responsible tool. Combined with sufficient version control, we are able to mitigate risks and know which areas to pay particular attention to.


  • Triveni
    Triveni Dataiker, Registered Posts: 20 Dataiker

    This is a fantastic use case, and it is awesome to learn about how Unilever is working towards better language models. As an industry DS for Responsible AI at Dataiku, I would be interested in speaking with your team more! Please let me know if that is possible.

Setup Info
      Help me…