Study of the Latent Space in NLP: Mathematical Foundation and Application to Disentanglement
Recent progress on word and sentence embeddings has enabled efficient representation and learning of complex high dimensional probability distributions over rich text data. The proposed research aims at addressing some of the fundamental questions in this field: What are the natural mathematical structures on that latent spaces? How to find a meaningful basis? What is the best method of disentanglement for NLP? Through this collaboration, RBC Borealis AI will gain insights and knowledge of some fundamental ideas in machine learning and natural language processing, become familiar with state-of-the-art disentanglement and embedding models, and make improvements to their products such as news filtering, financial asset valuation, automated trading and personalized reward program