AIDOX – Document Verification System

To validate structured trade contracts for language and economic term correctness, the existing document understanding systems use machine learning methods, natural language understanding, and text analysis, to extract data elements from financial documents. This can be expanded to a wide variety of financial documents, especially customer-provided reference material. The project focuses on document information extraction to extract the key data elements from financial documents and validate this data against the internal source of record.
The existing OCR-based approach has issues of error propagation (especially with noises), slow processing time due to the large model size, as well as the need for retraining for new document classes. To improve this, the first approach is to improve OCR engine performance by exploring OCR engines with pre-processing and post-processing techniques. The second objective is to implement OCR-less models which learn jointly with image and text contexts.

Faculty Supervisor:

Gerald Penn

Student:

Partner:

Scotiabank

Discipline:

Computer science

Sector:

Finance and Insurance

University:

University of Toronto

Program:

Accelerate

Current openings

Find the perfect opportunity to put your academic skills and knowledge into practice!

Find Projects