besea.n - Analyzing News Articles to Further Advocacy Work and Fight Bias

besean Registered, Frontrunner 2022 Finalist, Frontrunner 2022 Participant Posts: 1 ✭✭✭


Mai-Anh Peterson
Viv Yau
Charley Wong
Amy Phung
Kai Pan
Karlie Wu
Ernest Lo
Serena Yuen
Iona Lamy-Yang
Peter Menzies
Jonathan Gray

Country: United Kingdom

Organization: besea.n

An organization whose mission is to empower, educate and embrace East and South East Asian (ESEA) communities and their allies in the UK. Our advocacy spotlights ESEA experiences through our platforms, research/reporting, and events - providing a safe space for sharing knowledge, creating joy, and fostering connections. In addition to awareness-raising campaigns, content creation, research/education, and events, we offer professional Diversity & Inclusion services to businesses and organizations, with a focus on anti-racism and joyful advocacy, as well as hate crime de-escalation/active bystander workshops through our partnership with identity-based violence charity, Protection Approaches.

Awards Categories:

  • Data Science for Good
  • Most Impactful Transformation Story

Business Challenge:

At the start of the COVID-19 pandemic in 2020, people of East and Southeast Asian heritage (ESEA) began to notice an alarming media overrepresentation in pandemic reporting, with some newspapers going as far as to superimpose stock images of ESEA faces onto graphics of the virus itself. Along with harmful rhetoric by some politicians and online, this over-representation of ESEA imagery associated with the pandemic contributed to the scapegoating of ESEA people as responsible carriers of a deadly virus.

Conscious or unconscious biases associating ESEA people with the pandemic can contribute to an environment of hostility, discrimination, blame, and violence. Hate crimes towards ESEA communities in the UK, Europe, and North America have skyrocketed, with statistics from some institutions showing a doubling or tripling of Incidents. This only scratches the surface. Many hate crimes go unreported, and reporting an incident is no guarantee of action being taken.

In response to this situation, volunteers at besea.n (Britain's East and South East Asian Network), an organization that advocates for and champions ESEA people in the UK, examined how ESEA people were visually portrayed in news coverage related to the pandemic. However, we struggled with the size of the dataset, and the major challenge was in filtering down the number of articles to show the most ‘relevant’ content and, indeed, in defining relevance.

Business Solution:

The 41 volunteers, coming from all backgrounds, started with an initial dataset of over 1.6 million global news articles from between January to July 2020 compiled by text analysis company Alien. The UK-only data subset included metadata for each article, such as news source, news source location, article body, and keywords.

A further subset of 100,000 articles randomised by date was grouped into possible topics using a natural language processing topic modelling (Latent Dirichlet allocation) algorithm. The algorithm used statistics to assign articles to a topic, according to the frequency and combination of neighbouring words within the article.

By analysing the words and phrases that appeared, the most relevant topic relating to that article could be discerned. Topics were labelled according to the most frequent words used, for further filtering. The remaining articles covered the development/impact of the pandemic, its extent, and government responses.

To obtain a smaller sample for analysis, the relevant articles were further filtered to include only UK publications, and the remaining 16,366 articles were sorted by publish date. A final, randomised sample of 2,250 articles was taken by besea.n volunteers, who then assessed whether each image or video used in the article was of a public figure, a named/relevant person to the article, a stock photo of people, photos with no people in them.

Business area enhanced: Analytics

Use case stage: Built & Functional

Value Generated:

We used the dataset produced by the Dataiku NLP to write the resultant report that has been instrumental in besea.n’s advocacy work, which has been recognized at the government level, with the then DCMS Minister Caroline Dineage condemning pandemic-related racism in the House of Commons and Sarah Owen MP tabling the UK’s first ever debate on anti-ESEA racism in Parliament.

Value Brought by Dataiku:

Not only did Dataiku provide the valuable tools and licenses to make this report possible in such a short space of time (a report that has significantly bolstered the credibility and integrity of our organization), but our volunteers were also provided with invaluable training that will further benefit our organization on future studies, alongside the key connects we have made with the teams at Dataiku.

Setup Info
      Help me…