Semi-Supervised Learning for NLP Text Classification- ON-410

Desired discipline(s): Engineering - computer / electrical, Engineering, Computer science, Mathematical Sciences
Company: Munich Re
Project Length: 6 months to 1 year
Preferred start date: As soon as possible.
Language requirement: English
Location(s): Toronto, ON, Canada
No. of positions: 1
Search across Mitacs’ international networks - check this box if you’d also like to receive profiles of researchers based outside of Canada: 

About the company: 

In keeping with our global position as an industry leader and innovator, Munich Re is driving transformative change in the life reinsurance industry. Our innovation strategy sets a new standard in digitization that will radically transform our end-to-end business operations and deliver world-class, fully automated business processes and workflows to our North American Life & Health business, which focuses on both traditional reinsurance solutions that concentrate on the transfer of mortality risk as well as living benefits products.

Please describe the project.: 

Background Labeling text data for training is a time-intensive and tedious process. Given the large amount of data and the ramp up in the organization’s digitization efforts which is further increasing the amount of data, it is imperative to shorten the AI development cycle through automated data labelling.

Goals Semi-supervised learning is a hybrid approach which combines elements of supervised and unsupervised learning to train models with a small amount of labeled data and a large amount of unlabeled data. We are interested in exploring different semi-supervised learning techniques (including deep learning based ones) that best suit the text data for the insurance/re-insurance industry. For a start, the candidate will work on identifying “cause of death” from unstructured obituary text which will complement and enrich our internal claims data.

Expected Outcomes The candidate is expected to output generalizable algorithms for labelling of unstructured text data.

Required expertise/skills: 

  • Python
  • Experience in Object-Oriented Programming languages i.e. C#, Java, C++ etc
  • Foundational understanding of Machine Learning i.e. NLP related
  • Pattern Recognition
  • Probability and/or Statistics