Automatic Document Segmentation and Identification - ON-124
Preferred Disciplines and Level: Computer Science, Computational Linguistics, Computer Engineering. Post-Doc or Existing Ph.D. Student.
Company: Crater Labs Inc.
Project Length: 8-12 months (2 units)
Desired start date: As soon as possible
Location: Toronto, Ontario
No. of Positions: 1
Preferences: We would prefer to work with a university in the Greater Toronto Area or Southern Ontario. Language: English, Bilingual
About the Company:
We are a Toronto-based studio specializing in the use of computer vision, predictive analytics and natural language processing to build intelligence into business applications.
Semi-structured business documents such as questionnaires and RFI contain questions, or statements requiring a response. Currently, most NLP driven document structure extraction and identification algorithms use template driven approaches typically using a combination of formatting and statement analysis. We seek to examine if deep-learning methods can be employed to segment and identify components of a document requiring a response.
- Based on the available data, determine the accuracy level (if any) a neural network has in segmenting and identifying the relevant components of a document
- Hypothesize the degree to which other data (available or unavailable) may be able to increase the accuracy of this system
- Identify the appropriate neural network for solving this type of problem
- Train a neural network using source document and labelled relevant text and examine the efficacy of the model on provided sample business documents
- Examine results, determine the degree of accuracy in segmenting and identifying response components of a document
- Create a prototype model
Expertise and Skills Needed:
- Familiarity with basic natural language processing and deep-learning concepts
- Familiarity with deep learning libraries such as TensorFlow or Caffe
- Knowledge of Python, matplotlib, numpy, pandas and Jupyter Notebooks
For more info or to apply to this applied research position, please