Developing auto-annotation of actions in movies assisted by a haptic track

D-BOX is designing and manufacturing home theater and movie theater motorized seats. These seats are used to create an immersive environment for users by generating movements and vibrations that are synchronized with the action in the movies. Right now, making these seats vibrate in the right way takes a lot of manual work. This project consists specifically in the action detection part of the process. The goal is to develop deep learning models using the sound and the images from the movie to detect a series of events and their timing, such as gunshots, explosions, fights, car engine sounds, etc. The approach is to use a two-stream topology with sound and video as the input streams. The sound will be processed as spectrograms frames making both streams process image-like content and leverage two cues for understanding the action in the movie. RNN and LSTM models will be explored and other options, such as YOLO, SSD, FASTER R-CNN and even the latest Transformer-based models. This project is beneficial to D-Box Inc. since it has the capability of automating the haptic effects generation.

Faculty Supervisor:

Adam Oberman

Student:

Partner:

D-BOX Technologies Inc.

Discipline:

Computer science

Sector:

Manufacturing

University:

McGill University

Program: