A large amount of health-related data is available only in unstructured form (free-form text). To share this data for secondary purposes, it is necessary to de-identify it to protect against inappropriate disclosure of personal health information (PHI). PARAT Text is Privacy Analytics de-identification software for unstructured data. It automatically discovers and marks PHI in a variety of document formats using gazetteers and a bunch of rules. The primary problem of this tool is that it is limited by the knowledge of human experts, gazetteer lists, and lack of contextual knowledge. I plan to explore unsupervised and semi-supervised machine learning approaches to make the PHI discovery more robust. This will provide elegant and robust methods to deal with text data, which might broaden the partner organizations consumer base.
Engineering - computer / electrical
Information and communications technologies
University of Ottawa
Find the perfect opportunity to put your academic skills and knowledge into practice!Find Projects
The strong support from governments across Canada, international partners, universities, colleges, companies, and community organizations has enabled Mitacs to focus on the core idea that talent and partnerships power innovation — and innovation creates a better future.