Embedding space for computer source code - ON-205

Preferred Disciplines: Computer Science, Data Science, Natural Language Processing (Masters or PhD)
Company: Irdeto
Project Length: 12-18 months (3 units)
Desired start date: As soon as possible
Location: Ottawa, ON (Kanata)
No. of Positions: 1
Preferences: National Capital Region

About the Company: 

We protect digital platforms and applications for media & entertainment, connected transport, and IoT connected industries. Our solutions and services enable our customers to protect their brand and revenues, create new offerings and fight cybercrime. We think holistically with a true end-to-end solution – it’s not enough to excel in one area of security. We differentiate by having a 360 view of security that is as deep as it is comprehensive, and can actually leverage that view to solve business challenges and predict the challenges on the horizon. As our tailored solutions will empower our customers to continually adapt and grow with the changing times, we build a strong relationship with each of them. With nearly 50 years of expertise in security, Irdeto’s software security technology and cyber services protect over 5 billion devices and applications for some of the world’s best known brands.

Project Description:

Research and apply modern natural language processing methodologies (embeddings, autoencoders) to computer source code.

  • Find word-embeddings specific to code that allow for meaningful analysis
  • Use machine learning to develop models that provide contextual information about computer source code.
  • Apply machine learning algorithms, such as Contrastive Divergence, backpropagation, skip-grams, and others in Python.
  • Produce visualizations using PCA, TSNE, and other methods to demonstrate the effectiveness of the research.

Research Objectives:

  • Find the boundaries of effectiveness for a source-code autoencoder.
  • Measure how effective different architectures (model type, learning algorithm) are with respect to mapping source code to a coordinate space.

Methodology:

  • Natural Language procesing, such as word embeddings
  • Deep learning, autoencoders, restricted boltzmann machines, contrastive divergence, backpropagation

Expertise and Skills Needed:

  • Python
  • Abstract and creative thinking
  • Machine learning
  • Deep learning
  • Natural language processing
  • Autoencoders
  • Problem solving

For more info or to apply to this applied research position, please

  1. Check your eligibility and find more information about open projects
  2. Interested students need to get the approval from their supervisor and send their CV along with a link to their supervisor’s university webpage by applying through the webform or directly to Mel Chaar
Program: