Personalization of a health specific search engine - QC-685

Project type: Research
Desired discipline(s): Computer science, Mathematical Sciences, Statistics / Actuarial sciences
Company: Clinia Health Inc.
Project Length: Flexible
Preferred start date: As soon as possible.
Language requirement: Flexible
Location(s): Montreal, QC, Canada
No. of positions: 2
Desired education level: Undergraduate/BachelorMaster'sPhD
Open to applicants registered at an institution outside of Canada: No

About the company: 

Clinia makes it easy for health organizations to implement and deploy personalized health-grade navigation within their own digital platforms. Our modular AI-powered infrastructure understands every individual’s unique situation, enabling better classification and discovery of approved and trusted health resources. Powering millions of journeys every year, Clinia’s products are built to scale with the largest digital health companies. Organizations of all sizes work with us to transform the way people and care teams find and access care. Today, Clinia is proudly headquartered in Montréal, Canada with team members sprawled across North America, South America, and Europe.

Describe the project.: 

Clinia already has some tools and resources in place to detect user intent. In this project, we want to review our model’s training data and re-evaluate how well it fits our use case. We also want to make sure that we have the best quality data for our task by adding new elements. For example, we want to ensure that medical entities are properly identified by our Named Entity Recognition (NER) model, but minimize the likelihood that syntactically similar non-medical terms are interpreted as medical entities. To do this, we would like to introduce negative examples into our training set with these syntactically similar terms. In addition, sentences with related concept pairs could also be introduced to bring some embeddings closer together. For the second part of the project, since NER models have a natural tendency to bring entities of the same class closer together in the embedding space, this project would require thinking about a new model designed to create meaningful embeddings in the context of terminology matching. In this situation, we would like to create an embedding space where concepts are grouped according to their relationships with other concepts, regardless of their class. For example, we would like to bring together in the embedding space the concept of "heart" with heart diseases, heart related procedures such as surgery or even products such as a pacemaker for example. The research intern will have access to the technologies developed by Clinia as well as to the expertise of the employees. More specifically, he/she will have the opportunity to work with state-of-the-art models such as BERT architecture and possibly with large language models (LLM) in addition to the ontology developed by Clinia.

Required expertise/skills: 

The ideal research intern would likely have experience working with large NLP datasets and processing techniques as well as the ability to work with NLP models for various tasks. The candidate would have a good understanding of deep learning models based on Transformers (or similar architectures) and have experience training and evaluating them. More specifically, the intern would have the following skills:

  • Intermediate to advanced Python skills
  • Intermediate to advanced knowledge of PyTorch, HuggingFace and Spacy libraries
  • Familiar with Pandas or PySpark (an asset) and/or multiprocessing for data - Experience working with microservices (Docker)
  • Experience working with cloud environments (AWS an asset)