Visual-haptic Representation for Zero-shot Learning
Humans recognise objects in the world leveraging multi-modal sensory inputs beyond visual aspects (images and videos). Touch based information (Haptics) possesses rich information about structure, shape and other objetness properties. In this work, we will study and learn cross-modal representations between vision and touch. To connect vision and touch, we plan to introduce a zero shot classification task of recognising unseen object categories from shapenet dataset using haptics signals. We will train our model to encode the haptics information to a view agnostic embedding space that captures the geometrical aspects of the object. To support our claims, we will use shapenet dataset, a repository consisting of CAD models of various categories of objects that can be rendered from different views and the Johns Hopkins Modular Prosthetic Limb for haptics data. Our hypothesis is that our learnt representation can help transfer representations across modalities, for zero shot classification and object retrieval.