ToxBuster: Robust & Trustworthy Toxicity Detection

Hate speech and toxicity pose a serious threat in online spaces, particularly for marginalized communities. Detecting and preventing harmful speech in online games is challenging, and current methods lack transparency and reliability. To address this issue, this project aims to develop a robust and trustworthy toxicity detection model for in-game chat. On the robustness side, we will reiterate on an existing context-aware toxicity detection model to address four main areas: rare categories, continuous learning, adversarial learning, and human-in-the-loop. On the trustworthiness side, we will make our language models more transparent, explainable and observable to researchers and moderators. On the players side, we will design a system for accountability, and easy reversibility of decisions made. The results are expected to advance the field of continuous learning in NLP, bridge different areas of NLP and contribute to responsible and trustworthy Al.

Faculty Supervisor:

Reihaneh Rabbany

Student:

Partner:

Ubisoft Divertissement

Discipline:

Computer science

Sector:

Information and cultural industries; Manufacturing

University:

McGill University

Program: