Federated Learning for Language Models

The existing approach to building Machine Learning models includes gathering all the data from customers in one place and then running the training procedure using it. However, there are no guarantees personal data won’t be leaked by the one training the model. So, the customers face a difficult decision of either allowing companies to gather their data to improve the products or restrict information sharing to ensure privacy. In this project, we eliminate the need to make such a decision by developing a system that would analyze the customers’ data without them sending it anywhere. We employ Federated Learning techniques, which allows training machine learning models on the data but not allowing the company to see the data. Our system targets Natural Language Processing applications: one of the most privacy-sensitive yet highly demanded areas of Machine Learning. To further foster customers’ trust, we use additional privacy methods to ensure no information can be inferred from the communication between customers’ devices and the company’s servers.

Faculty Supervisor:

Aleksandar Nikolov

Student:

Partner:

Microsoft Canada

Discipline:

Computer science

Sector:

Information and cultural industries

University:

University of Toronto

Program: