Efficient computational methods for generating differentially private synthetic tabular data

Generating synthetic data is important for a number of machine learning problems at Mastercard especially in the areas of additional data generation for imbalanced problems, data sharing etc. The data is mostly tabular in nature and a number of techniques exist for generating tabular data in the literature. However most of these techniques do not work on large datasets or fail to generate differentially private datasets. We already have done some work in this regard (see https://link.springer.com/chapter/10.1007/978-3-030-92310-5_60 ). However, the problem is not “solved” yet as it is difficult to generate differentially private datasets from large training sets and metrics like machine learning efficacy can be abysmally lower. The intern would be asked to work upon improving the current algorithms available in the literature both from privacy and accuracy standpoint.

Faculty Supervisor:

Wael El-Dakhakhni

Student:

Partner:

Mastercard

Discipline:

Computer science

Sector:

Professional, scientific and technical services

University:

McMaster University

Program: