Multilingual Semantic Search Engine using Multilingual Semantic Similarity
Multiple situations require cross-lingual searching: lawyers reviewing litigation documents; intelligence analysts data mining open source data; and patent attorneys investigating technical documents. To imitate cross-lingual search, people use online translation platforms to find the equivalent terms laboriously and then re-execute the query multiple times in various languages. The commercial search industry hasn’t seen much demand for crosslingual search. Search is always monolingual and very English-centric. However, to communicate with end users, businesses regularly produce written documents in various languages. Therefore, a set of rules are required to ensure that information in these documents is 'correct' and consistent across languages and communication channels. This project aims at creating algorithms capable of performing semantic search within a very large pool of multilingual unstructured enterprise contents with less overhead regardless of the natural language being used for each document. The proposed algorithms must scale with the size of the corpus being used.