Multimodal Procedure Understanding

Procedural content (text and video) is abundant on the internet, and is regularly used in our daily lives, e.g., when we follow a cooking recipe to make a dish, or watch an instructional furniture assembly video. Automatically understanding such content allows for the development of various types of AI assistants, including those that can provide answers to our questions (e.g., asking a cooking assistant how much milk we needed in the recipe), and those that can guide users follow through a procedure (e.g., if the user forgets an important step). This project focuses on various aspects of automatically understanding procedural content.

Faculty Supervisor:

Frank Rudzicz

Student:

Partner:

Samsung Electronics Canada

Discipline:

Computer science

Sector:

Technology; Information and Communications Technology; New and Digital Media

University:

University of Toronto

Program:

Accelerate

Current openings

Find the perfect opportunity to put your academic skills and knowledge into practice!

Find Projects