Automated error detection for quality assurance in Web Development - QC-181

Preferred Disciplines: Computer Science, Machine Learning, Web Development (Masters)
Project length: 4-6 months (1 unit)
Approx. start date: February - June 2019
Location: Montreal, QC
No. of Positions: 1
Preferences: None
Company: Evolving Web

About Company:

Evolving Web is a team of innovative designers and technologists based in Montreal. We specialize in creating high quality websites and applications with the Drupal content management system. We work with clients of all types and sizes. Over the past eleven years, we’ve completed projects for numerous universities, businesses, as well as non-profit and cultural organizations. Our clients include Princeton University Press, McGill University, Western Digital, the Linux Foundation and the governments of Quebec and Canada.

Summary of Project:

Websites dynamically produced by Content Management Systems such as Drupal, WordPress and Craft are expected to deliver a reliable user experience. However, a typical commercial website will throughout its lifetime undergo constant security updates, and dynamic alteration of certain content elements. Any version change, however minor, requires precise verification of the pages served by the site to maintain accessibility, ease of navigation and avoid any regression harmful to the user experience.

Evolving Web maintains an open source crawler and QA tool for web sites, which like everyone in the industry we use whenever possible to help with the QA process and to find issues in existing code. In order to reduce the amount of spurious differences produced, our regression test tool allows the user to define customized comparators with the help of a large algebra of standard DOM transformations, including string regular expressions.

The goal of the proposed Mitacs work is to develop as an extension to our tool, an Oracle in the form of a machine learning classifier to score irregularities and page differences in a supervised learning framework. We expect delivery of this prototype to our QA team which will pilot a test program and provide feedback to improve the discrimination power of the classifier.

Research Objectives/Sub-Objectives:

  • Using known historical results - extract and curate a training set for supervised and unsupervised learning
  • Define a feature vector for supervised learning and evaluate the effectiveness of classic algorithms (e.g. logistic regression, edit distance NN, ...) for clustering or error detection.
  • If successful, embed the result of the machine learning in an effective Human-Computer interface. 


Machine learning investigation in this domain is relatively novel (see related work). We will initially follow a classic approach for supervised and unsupervised learning: defining a training set, extracting features, and applying known algorithms.

Related Work:

A lot of past research has focused on machine learning and natural language processing for classification, clustering, and information extraction from web pages. In the field of regression testing for web sites and applications, work has typically focused either on testing the web application as a software system in its own right (perhaps modeled by some type of graph or state machine representation) or on the page-wise comparison of individual pages, the latter being our specific area of interest. Pages can be compared either by using visual differences, or by using user defined equivalence relationship between DOMs with the help of various semantic operators. While some work has been done in the area of automating those tasks, detecting errors or small differences without constant and burdensome human interaction would require different techniques than the existing methods that have been proposed so far.

Expertise and Skills Needed:

  • Scripting skills (any one of Python, Javascript, PHP, Ruby)
  • Understanding of the fundamental principles of WWW and Web development technologies: HTTP, HTML, CSS, Javascript
  • Interest in Machine Learning, both supervised and unsupervised and Human Computer Interaction
  • Experience with Content Management Systems is a plus (WordPress, Craft, Drupal, etc..)

For more info or to apply to this applied research position, please

  1. Check your eligibility and find more information about open projects.
  2. Interested students need to get the approval from their supervisor and send their CV along with a link to their supervisor’s university webpage by applying through the webform or directly to Marianne Groleau