Towards a Foundation Model for Scene Understanding in Autonomous Driving

Autonomous vehicles operating at Level 4 autonomy require a comprehensive understanding of their current driving situation and context. While current systems rely mainly on perceptual information such as video, lidar, and radar, they often lack the necessary understanding of the vehicle’s concrete situation. To address this gap, this research project aims to develop a foundational model for scene understanding in the context of autonomous driving. The goal is to use this trained model as an additional input for downstream tasks, which require a nuanced understanding of the vehicle’s current driving scene, such as motion prediction and planning. We will leverage an existing Knowledge Graph, based on the public nuScenes dataset, which represents individual driving scenes, including the agents and objects present, as well as the relationships among them and information from map data. The main task is to train and evaluate foundational models based on a suitable deep learning architecture, such as a Transformer with multi-headed self-attention, and different scene representations as input. We plan to publish the results in a well-regarded conference in the field to contribute to the growing body of knowledge in autonomous driving research.

Faculty Supervisor:

Ioannis Mitliagkas

Student:

Partner:

Bosch

Discipline:

Computer science

Sector:

Wholesale trade

University:

Université de Montréal

Program: