Secure Cross-service Genomic Data Federated Analysis with GraphQL

Health research data is growing in volume, but also in richness. Health research data sets now routinely contain structured clinical data, unstructured clinical notes, genomics reads and variants, other “‘omics” data like RNA expression, transcriptomics, or epigenomic data, and medical imaging, with more data types coming each year.
A challenge in making this range of data types accessible while maintaining high privacy and security safeguards is extensively enabling queries across the individual tools which serve each kind of data in any combination. A researcher may want to query cancer patients (structured clinical records) with variants in the TP53 gene (genomic files) resulting in unusually high expressions of gene products (slices of RNA expression matrices); or examine unstructured clinical notes in the context of medical images. We propose to investigate cross-service data queries and analyses using the emerging GraphQL standard. GraphQL is flexible enough to allow us to develop high-level APIs above the services for privacy-enhancing queries, and powerful enough to let us build a toolkit for “joins” across quite disparate data types. Differential privacy will be built into the service to enable private data analysis that will protect patient information.

Siyue Wang
Faculty Supervisor: 
Yun William Yu
Partner University: