Efficient Learning Methods for Multimodal Understanding

The main focus of this research is to develop representation learning architectures and algorithms that can help perform various multimodal understanding tasks, and at the same time reduce the need for human supervision in the form of costly annotations. To achieve this goal, a learning system must be able to: (1) learn new tasks or concepts with a few examples; (2) effectively use the knowledge already acquired by the system; (3) rely on one modality (e.g., text/audio) to fill in gaps in another modality (e.g., vision); and finally (4) retain high performance on all tasks/concepts learned previously. The goal of this research is to develop methods that enable efficient learning for multimodal tasks (e.g., dialogue-based image/video retrieval) in scenarios with limited annotated data. As part of this work, we will focus on answering two key questions: (1) How to leverage the problem structure in order to enable learning algorithms that use limited annotations more effectively? and (2) What are the mechanisms that enable efficient learning from few examples?

Faculty Supervisor:

Animesh Garg


Nikita Dvornik


Samsung Electronics Canada


Computer science




University of Toronto



Current openings

Find the perfect opportunity to put your academic skills and knowledge into practice!

Find Projects