Improving existing algorithms for Vectorizing sentences, Scoring semantic similarity, and Topic Clustering - ON-141

Preferred Disciplines: Computer Science and Computer Engineering (Master, PhD, Post-Doc)
Company: DeepPiXEL
Project Length:  8-12 months 
Desired start date: ASAP
Location: Toronto, Ontario 
No. of Positions: 1-2
Preferences: Waterloo, UofT, Queens

About the Company: 

DeepPiXEL Inc. delivers solutions to corporations looking to improve and increase customer engagement for their products, services or engagement channels. We take questions that come into any given channel and provide answers to them with high accuracy. We focus on answering the repetitive questions, which constitute up to 80% of all customer inquiries. The most complicated questions are left to live agents to answer, with CARA assisting them by providing suggestions. This allows us to increase the number of simultaneous chats handled by support agents while reducing the time it takes for customers to get answers.

By utilizing CARA to serve their clients, our customers are able to improve their customers’ brand experience. Benefits include improvements in response times, average handling time, customer satisfaction, first call resolution, quality scores, agent satisfaction and adherence, conformance and agent productivity.

Project Description:

Improvements to our existing algorithms for: 1) Vectorizing English Language sentences, 2) Scoring semantic similarity/paraphrasing, 3) Topic Modelling and Clustering

Our product uses natural language models in order to identify similarity of phrases. We also score the similarity of two vectors using a proprietary scoring algorithm. When studying the data we collect, we analyze that data to get insights into how our product is used, and quantify the benefits it produces.

Research Objectives:

  • Quick and accurate vectorizing of English Language phrases and paraphraphs.
  • Improve scoring algorithms so that it reliably correlates to human level semantic similarity and paraphrasing, including for negative cases.
  • Quickly and reliably produce insights by analyzing data collected.


  • Vectorizing algorithms: word2vec, doc2vec, skip-thought.
  • Cluster analysis: k-means, nearest neighbour, approximate nearest neighbour.
  • Classification analysis: logistic regression, naive bayes, random forest.

Expertise and Skills Needed:

    • Python, Web Server (Flask/Sanic), Database (PostGres), Git
    • Natural Language Processing
    • Artificial Intelligence algorithm knowledge
    • Ability to research different approaches and conduct experiments independently

    For more info or to apply to this applied research position, please

    1. Check your eligibility and find more information about open projects.

    2. Complete this webform. You will be asked to upload your CV. Remember to indicate the title of the project(s) you are interested in and obtain your professor’s approval to proceed!

    3. Interested students need to get the approval from their supervisor and send their CV along with a link to their supervisor’s university webpage by applying through the webform or directly to Jillian Hatnean e at, jhatnean(at)