Automatic Document Segmentation and Identification - ON-124

Preferred Disciplines and Level: Computer Science, Computational Linguistics, Computer Engineering. Post-Doc or Existing Ph.D. Student.
Company: Crater Labs Inc.
Project Length: 8-12 months (2 units)
Desired start date: As soon as possible
Location: Toronto, Ontario
No. of Positions: 1
Preferences: We would prefer to work with a university in the Greater Toronto Area or Southern Ontario. Language: English, Bilingual          

About the Company: 

We are a Toronto-based studio specializing in the use of computer vision, predictive analytics and natural language processing to build intelligence into business applications.

Project Description:

Semi-structured business documents such as questionnaires and RFI contain questions, or statements requiring a response. Currently, most NLP driven document structure extraction and identification algorithms use template driven approaches typically using a combination of formatting and statement analysis. We seek to examine if deep-learning methods can be employed to segment and identify components of a document requiring a response.

Research Objectives:​

  • Based on the available data, determine the accuracy level (if any) a neural network has in segmenting and identifying the relevant components of a document
  • Hypothesize the degree to which other data (available or unavailable) may be able to increase the accuracy of this system

Methodology:

  • Identify the appropriate neural network for solving this type of problem
  • Train a neural network using source document and labelled relevant text and examine the efficacy of the model on provided sample business documents
  • Examine results, determine the degree of accuracy in segmenting and identifying response components of a document
  • Create a prototype model

Expertise and Skills Needed:

  • Familiarity with basic natural language processing and deep-learning concepts
  • Familiarity with deep learning libraries such as TensorFlow or Caffe
  • Knowledge of Python, matplotlib, numpy, pandas and Jupyter Notebooks

For more info or to apply to this applied research position, please

  1. Check your eligibility and find more information about open projects.

  2. Interested students need to get the approval from their supervisor and send their CV along with a link to their supervisor’s university webpage by applying through the webform.

    .

Program: