Establishing best practices for generating synthetic pediatric health data

As the COVID-19 pandemic has made painfully clear, it is both important and difficult to analyze the large volumes of patient data collected by hospitals and other healthcare providers. Ideally, data would be widely-shared between institutions, and experts and teams with diverse backgrounds would be able to contribute to the analysis. Unfortunately, this is not possible: sharing of healthcare data would severely compromise patient privacy, with many negative consequences. The goal of this project is to develop methods for the generation of realistic synthetic datasets that closely mimic real longitudinal healthcare records, without containing sensitive patient information. This synthetic data could then be shared widely and used as the basis for first-stage analysis.

Elnaz Karimian Sichani
Faculty Supervisor: 
Aaron Smith
Partner University: