Transductive open-vocabulary segmentation

Modern deep neural networks have demonstrated remarkable performance across various visual recognition tasks, including image classification and semantic segmentation. The effectiveness of these networks largely relies on having abundant labeled examples for each desired category. Acquiring such datasets, however, poses particular challenges in the context of semantic segmentation, where a label is required for each pixel in every image. Furthermore, while datasets fulfilling this requirement do exist in practice, they often come with a set of limited pre-defined categories. This means that if a novel category has to be recognized, annotating numerous new images and potentially re-annotating existing ones is required, which presents obvious practical challenges that impede the scalability of these models in these real-world scenarios.
To address this limitation, several learning paradigms have emerged in recent years. A challenging and practical scenario is zero-shot semantic segmentation, also referred to as open-vocabulary segmentation, where no pixel- labeled samples for each novel category are available at all. To overcome the total lack of pixel-wise information, zero-shot approaches incorporate knowledge about unseen classes from other available sources, for example in the form of text descriptions.

Faculty Supervisor:

Jose Dolz

Student:

Partner:

Thales Recherche et Technologie

Discipline:

Engineering

Sector:

Management of companies and enterprises; Manufacturing; Professional, scientific and technical services

University:

École de technologie supérieure

Program: