Identifiability of latent factors through multiple self-supervision
Human perception has developed the ability to decompose scenes into fine grained elements. This lays the foundation for strong generalization to new situations where the base concepts can be recomposed to interpret objects never seen before. While it has been shown that, in the general case, proper decomposition is not possible, new paradigms provide provable decomposition in constrained environments. We hypothesize that the multiple sensory systems of human perception offer a strong signal for decomposing scenes in a proper way. While the 2-view approach has been explored in various forms in the machine learning community, we believe that the granularity of the decomposition depends on the number of views integrated.