Table extraction for logistics and supply chain industries using machine learning- QC-465

Project type: Research
Desired discipline(s): Engineering - computer / electrical, Engineering, Computer science, Mathematical Sciences, Mathematics
Company: Mely.AI Technologies
Project Length: Longer than 1 year
Preferred start date: As soon as possible.
Language requirement: English
Location(s): Montreal, QC, Canada
No. of positions: flexible
Desired education level: CollegeUndergraduate/BachelorMaster'sPhDPostdoctoral fellow
Search across Mitacs’ international networks - check this box if you’d also like to receive profiles of researchers based outside of Canada: 
No

About the company: 

Mely.ai is an AI-powered solution company, helping enterprise in Supply Chain and Logistics to accelerate their digital transformation journey.

Our proprietary Smart Document Extraction engine automatically, accurately and rapidly extracts key information from documents, including lengthy commercial invoices, complicated packing list, or non-standardized bill of lading or waybills, removing back-office inefficiencies and saving 90% of time and 80% of cost when removing manual labour from low-value activities such as data entry.

Describe the project.: 

Automated table extraction is an active field of research. Irregularities found in tables like nested headers, merged cells, and lack of grid lines have made generalized approaches difficult. In this project, we hope to apply the latest deep learning research to read borderless tables in standardized forms like invoices, certificates of analysis, and other corporate documents. 

Our software platform currently reads templated manifests, but has difficulty reading tables. By developing this technology, we can help logistics and supply chain companies in their digital transformation journeys. The candidate may be able to see the direct impact of their work reflected on our platform.

The candidate would have the following responsibilities:

  • Overseeing and developing models needed to train a generic model for table extraction, with great accuracy
  • Setting up an MLOps pipeline to measure model performance in a production setting
  • Work with the current data scientist, along with the software development, and business team to develop a commercial product

We mostly expect to use computer vision and deep learning algorithms but may also use natural language processing algorithms. Methodologies may evolve as the project develops with the candidate.

Required expertise/skills: 

The ideal candidate would have these specific skillsets:

  • Background in form and document extraction
  • Experienced with computer vision
  • Experience with deep learning in the computer vision domain
  • Experience putting models in a cloud environment (AWS, Azure, GCP)
  • Presentation skills to offer insights about how the technology can help expand the commercial side of the business
  • Python is a must
  • Proficient in English
  • Experience with OpenCV, Keras, Tensorflow, or Scikit-learn

Experience with natural language processing is a nice to have but not required