Joint burst image denoising and deblurring

Capturing high quality images is a major selling point of modern flagship smartphones. However, due to physical constraints on sensor size, capturing a good image with a mobile phone is challenging especially in low-light conditions. To obtain a bright image, one has to use a long exposure or digitally gain up a short exposure image. The former will result in blurry images due to handshake and scene motion, and the latter will yield noisy images. We will explore methods that combine complimentary information from both types of images captured from two separate cameras synchronously.

Efficient Multi-frame Obstruction Removal

Modern mobile phones have become the dominant photography device in recent years. However, their users are often not professional photographers. Thus, they lack the skills of choosing proper lighting, proper shot framing, and proper settings on the camera. In particular, photographs are often taken in unfavourable conditions where the scene of interest is obstructed by a fence or a window. We would like to remove such obstruction automatically. Capturing scenes from multiple viewpoints, not only helps to identify the obstruction better, but also helps in removing it.

Weakly supervised representation learning for sequential and composite actions.

Camera enabled AI-based personal assistants will need to recognize human actions in order to be safe and effective. Current machine learning approaches for action recognition require extensive datasets of annotated videos that depicting the actions to be recognized. Such datasets are expensive to acquire. The goal of this project is to decrease the annotation required to train viable action recognition systems.

Efficient Learning Methods for Multimodal Understanding

The main focus of this research is to develop representation learning architectures and algorithms that can help perform various multimodal understanding tasks, and at the same time reduce the need for human supervision in the form of costly annotations.

Deep learning-based illuminant estimation for mobile devices

Humans possess the ability to see objects as having the same color even when viewed under different illuminations. Cameras inherently lack this capability. A process called auto white balance (AWB) has to be applied by the camera to mimic this behavior of the human visual system. AWB is one of the first steps in a series of operations performed on-board the camera as the raw image recorded by the sensor is processed. It plays a crucial role in ensuring that the colors in the final image that is output to the user are correctly represented.

Modeling local tone-mapping for raw image reconstruction-aware deep image compressors

Cameras apply a lot of processing on the raw image recorded by the sensor to enhance the brightness, contrast, and colors, and make the output image visually pleasing. The image is also finally compressed to make the file size smaller. These operations make it difficult to reverse to the raw sensor image which is necessary for several computer/machine vision tasks. Existing AI methods to invert from the camera’s compressed output image to the raw sensor image assume that only global color and tone manipulations are applied.