AIDOX – Document Verification System
To validate structured trade contracts for language and economic term correctness, the existing document understanding systems use machine learning methods, natural language understanding, and text analysis, to extract data elements from financial documents. This can be expanded to a wide variety of financial documents, especially customer-provided reference material. The project focuses on document information extraction to extract the key data elements from financial documents and validate this data against the internal source of record.
The existing OCR-based approach has issues of error propagation (especially with noises), slow processing time due to the large model size, as well as the need for retraining for new document classes. To improve this, the first approach is to improve OCR engine performance by exploring OCR engines with pre-processing and post-processing techniques. The second objective is to implement OCR-less models which learn jointly with image and text contexts.
Gerald Penn
Scotiabank
Computer science
Finance and Insurance
University of Toronto
Accelerate