Comparing and Improving Approaches to Topic Modeling
The proposed research project aims at evaluating and improving a technique in Statistical Natural Processing called Topic Modelling in order to apply it to real-life scenarios. Topic modeling is a techniques that allows the quick discovery of what the main topics of a document collection are, and thus automatically answers the question What do these documents talk about?.
Several approaches have been proposed to implement topic modeling, but their evaluation have rarely taken the end-use into account. In addition, the topics identified by such techniques are often based on single words and seen as the end-result.
In this research, we wish to address two main issues: 1) the evaluation of methods in topic modeling based on a social validity assessment when applied to real-life applications, and 2) the improvement of the extracted topics based on other linguistic units, other than single words.