Multilingual Semantic Similarity Engine
To communicate with their end users, businesses regularly produce written documents such as letters, notices, statements, etc.., in various languages. A set of rules are usually used to ensure that information in these documents is 'correct' and consistent across languages and communication channels. However, with the increasing volume and variety of information being sent out to clients, it becomes difficult to preserve the semantics of client messages across vocabulary and language variations. This project aims at creating algorithms capable of measuring semantic similarity of two text documents regardless of the natural language being used for each document. The set of similarity algorithms must scale with the size of the corpus being used.