Text mining is the process of automatically extracting knowledge from unstructured, natural language documents. It aims to support users in dealing with large amount of textual information. Examples for specific text mining tasks are entity detection, summarization, and opinion mining. Due to the complexity and ambiguity of natural language, this analysis is broken down into individual processing steps, which are based on the techniques from the fields of machine learning, natural language processing, and semantic computing.
In this project, the goal is to enrich the text mining pipelines developed at KeaText for the processing of legal documents. Specifically, the analysis is to be enriched with a topic segmentation module that is tailored to the specific domain and application requirements. Automatic topic segmentation, also known as text tiling, structures documents into individual parts, each representing a distinct theme. It is well-known that topic segmentation can improve several information retrieval and text analysis tasks. In this project, the following tasks are to be completed: (1) Survey of existing research literature to identify suitable methods and tools; (2) Design of a new topic segmentation algorithm specifically for legal documents; and (3) Implementation and evaluation of this algorithm based on the General Architecture for Text Engineering (GATE) framework.
Dr. Rene Witte
Information and communications technologies
Find the perfect opportunity to put your academic skills and knowledge into practice!Find Projects
The strong support from governments across Canada, international partners, universities, colleges, companies, and community organizations has enabled Mitacs to focus on the core idea that talent and partnerships power innovation — and innovation creates a better future.