Visual attention refers to the mechanism of dynamically and selectively focusing on a subset of the visual input stimuli for detailed analysis, which is part of the visual perception process of the early primate vision. It has been successfully integrated into the design and implementation of many artificial visual recognition systems with applications to image classification, object detection, object sequence recognition, as well as image captioning and visual question answering.
3D object detection and pose estimation (ODPE) is an important area in computer vision, which has many real world applications such as robotics and augmented reality. Estimating an accurate pose from RGB cameras alone is challenging. Given the increased popularity of RGBD sensors, such as Microsoft Kinect, we propose a hybrid method for ODPE based on both RGB and depth (3D surface) information. The additional depth information is expected to provide significant improvements in both speed and accuracy of the resulting system.
The project aims to build a 3D object detection system by using a number of images from multiple cameras. The system will train on instances of objects to detect other instances of the same object. This means that if, for example, we want to detect a sphere, we will make the system learn how a sphere looks like by giving some example images. Now when the system encounters a new object which looks similar to the example images, the system will detect the new object to be a sphere. Similarly the system will train for detection of more complex objects.