A Cross-modal Video Representation Learning Framework for Query Retrieval and Automatic Trailer Generation

Grokvideo Inc. works on developing new technologies for extracting the best possible information from video content. This research project will provide new solutions to text-based query-video retrieval and automatic trailer generation for eventful videos such as movies and serials. Current methods seem to work well for short, non-eventful videos, say documentaries, but fail otherwise. Major challenges are posed by the requirement to handle raw video, with possible lack of text descriptions, to maintain coherence in the generated trailer, and to generate trailers with focus on specific emotion/genre, say, action, comedy, dance, etc. The project’s main objective is to develop an end-to-end neural network framework where users can submit video datasets with/without descriptive captions, and fine-tune the model for their needs. It will be pre-trained and available as a service to Grokvideo clients. This research will benefit the partner company and also others in the entertainment sector in Canada and worldwide.

Faculty Supervisor:

Sudhir Mudur

Student:

Partner:

GROK VIDEO Inc

Discipline:

Computer science

Sector:

Professional, scientific and technical services

University:

Concordia University

Program: