Interfaces and algorithms to interactively improve medical datasets for machine learning - BC-372

referred Disciplines: Engineering / Computer Science. Masters or Ph.D.
Project length:  8 to 12 months (2 units)
Desired start date: May 1, 2018
Location: Vancouver, BC
No. of Positions: 1 
Preferences:  Must be able to work onsite at Vancouver office. Language : English
Company: Galiano Medical Solutions Inc.


About Company:

Galiano Medical Solutions Inc. is creating leading edge medical image processing technology that exploits machine learning to empower physicians and improve patient care.

Project Description:

The success of our algorithms depends on the availability of high-quality data, which in our current project means working with chest x-ray images (CXRs) that are accurately labeled with the findings that a radiologist would report in their examination.

Publicly-available datasets exist to help us build our algorithms; for example, the National Institutes of Health (NIH) recently released a large set of labeled CXR images. The NIH collection has over 110,000 images – sufficient in quantity for machine learning studies – yet to provide this large set of images, their labels have been computer-generated from radiologists’ reports. Due to this automated reading process, many of the labels are incorrect. Presenting these “weakly-labeled” images to expert humans for review allows for label quality improvements, thus fulfilling a critical step on the path to better image processing solutions.

The goal of this project is to identify and adapt algorithms and their corresponding user interfaces to make significant improvements to the weakly-labeled CXR dataset, while leveraging the limited time available from expert label reviewers. As the larger machine learning community – far beyond the medical field – relies on weakly-labeled but large publicly-available datasets, this study has a potentially broad impact.

Background and required skills

Research Objectives/Sub-Objectives:

  • Preferred methods for :
    • choosing which images to show an expert for re-labeling
    • re-labeling images that the expert doesn’t see, based on image-image similarity
  • Measurements of the success of each of the algorithms attempted


  • Data ingestion
  • Front-end design
  • Similarity training
  • Algorithm experimentation
  • Review and completion

Expertise and Skills Needed:

  • Essential Skills
    • Experience implementing numerical computational algorithms
    • Demonstrable knowledge of software development best practices
    • Proficiency in Python development in a Linux environment
    • Experience in at least three of the following Desired Skills
  • Desired Skills
    • Machine Learning or Data Science solution development
    • Computer Vision application development
    • User Interface design and implementation
    • Web Development using JavaScript
    • Docker, ReactJS, Bootstrap, TensorFlow, Flask or OpenCV development
    • Commercial software development in an Agile/DevOps environment


For more info or to apply to this applied research position, please

  1. Check your eligibility and find more information about open projects.
  2. Interested students need to get the approval from their supervisor and send their CV along with a link to their supervisor’s university webpage by applying through the webform or directly to Amin Aziznia aaziznia(a) .