Fairness Attacks on Large Language Models

Under the European General Data Protection Regulation (GDPR), businesses using customer data to make decisions must provide explanations in human terms on demand to the customer to explain their decision. The rise of natural language generation (NLG) and chain-of-thoughts (CoT) represents an opportunity to combine the power of deep learning, automated decision-making, and explainable AI. However, there are two important challenges standing in the way of making these systems useful. The first one is to align the AI system with human preferences in terms of decisions and explanations. The second one is to make the AI systems fair and robust to adversarial attacks. The aim of this project is to investigate the robustness of large language models (LLMs) to changes that should not affect their decisions, like meaningless textual reformulations, and changes to sensible information such as sex and race.

Faculty Supervisor:

Quentin Cappart;Louis-Martin Rousseau

Student:

Partner:

ServiceNow Canada

Discipline:

Computer science

Sector:

Transportation (excluding aerospace); Technology; Information and Communications Technology

University:

Polytechnique Montréal

Program: