Toxplainer : Bias identification and explanation in a toxicity detection language model

Online multiplayer games continue to see an increase in their popularity and their playerbase is becoming increasingly diverse. At the same time, there is a concerning amount of harm and harassment found in text chat which prevents games from being safe and inclusive online spaces. Current detection solutions and moderation systems falls short of addressing this issue at scale. More innovative automated detection tools, that use machine learning to consider the context of the chat rather than a list of keywords, are the first steps towards a solution that can treat a large quantity of data easily. However, these tools use large language models that are known to perpetuate social biases in their applications which hampers their effectiveness and can harm the very communities they are intended to protect. The purpose of this project is to identify identity-based biases and explain their sources in language models trained to detect toxicity in online multiplayer games’ text chat. The findings from this research are expected to contribute to the creation of more inclusive online spaces in multiplayer games through effective and fair harm detection tools.

Faculty Supervisor:

Grégoire Winterstein

Student:

Partner:

Ubisoft Toronto

Discipline:

Sociology

Sector:

Information and cultural industries; Manufacturing

University:

Université du Québec à Montréal

Program: