Document Understanding

Management of digital content (documents, screenshots, webpages, etc.) spread across many apps and platforms is becoming an ever-pervasive problem for professionals and businesses alike. Most existing solutions are focused on the storage and distribution of digital content, and there is yet a gap in the market for a tool that addresses the content management problem by gaining an understanding of their text content. Charli AI strives to address this gap by providing an easy to use platform for everything content-related; such as filing/ folder organization, optimal search, team collaboration, and analysis-driven insights and actions such as reports and reminders.
We propose to apply a set of empirical studies for document classification and information extraction from these documents. Specifically, character level language models, pre-trained neural language models, and transfer learning techniques in zero shot and few shot settings will be studied. The studies aim to determine the best document embedding for document classification, and Named Entity Recognition models for information extraction.

Faculty Supervisor:

Fatemeh Hendijani Fard

Student:

Partner:

Charli

Discipline:

Computer science

Sector:

Information and cultural industries

University:

The University of British Columbia - Okanagan

Program: