Estimating Optimal Treatment Regimes from Electronic Health Records using Natural Language Processing for Unmeasured Confounding

n precision medicine, optimal dynamic treatment regimes (DTR) are a sequence of decision rules that individualize medical treatments. DTR estimation methods use observational data, such as electronic health records (EHR), which may lack variables that capture doctors’ rationale behind treatment assignment. However, while such variables, also referred as confounders, may not be directly recorded in EHRs, they may be embedded in unstructured medical notes. In this project, our project introduces Relational-variational Graph Autoencoders (R-VGAEs) in precision medicine via DTR estimation. R-VGAEs are particularly well-equipped at identifying latent features within graph-structured data, such as medical notes. We aim to compare our proposed method to DTR estimation via conventional unsupervised natural language processing methods, such as word2vec, doc2vec and ELMo, to showcase its performance. We also apply it on data from MIMIC-IV, a publicly available dataset, to show that the additional use of medical note data can further improve treatment personalization.

Faculty Supervisor:

Olli Saarela

Student:

Partner:

The University of Tokyo

Discipline:

Mathematics

Sector:

Education

University:

University of Toronto

Program: