Data Anonymization for Medical Records

Data anonymization focuses on removing any personal or other information that would identify an individual or a set of persons from a given collection of records. In addition, anonymization should also mask information that could potentially be combined with other publicly available data to reidentify the individuals. This project aims to study data anonymization for medical records, particularly pertaining to radiology reports that are generated through use of particular software. The project involves the following steps i) identify the information that need to be masked by studying a random subset of data sampled from the database, ii) develop or adapt the existing off -the-shelf tools for individual NLP components such as identifying the grammatical tag of words, recognizing multi-word named entities, handling ambiguities etc., iii) integrate the individual components into a full system and evaluate the performance of the components as well as the end-to-end system and iv) implementing a process flow for anonymizing medical records including the structured tables in the records database. We also like to preserve the complete medical history of individuals as also retaining their links to individual physicians and medical technicians, suitably anonymizing the information. The expert industrial partner will identify the information that required to be masked and will work closely with the academic team in evaluating and ensuring the privacy compliance of the anonymization process flow that would be deployed in the partner organization.

Baskaran Sankaran
Faculty Supervisor: 
Dr. Anoop Sarkar
British Columbia