Determine methods for interpretation of highly complex data regarding defects - ON-208

Preferred Disciplines: Computer Science focus on data analytics, post-Masters preferred
Company: Anonymous
Project Length: 1 year (2 units)
Desired start date: As soon as possible
Location: Kitchener ON, may be a split team with some in Toronto
No. of Positions: 1
Preferences: Colleges and Universities in Kitchener/Waterloo area. Strong preference for institutions without onerous IP requirements.

About the Company: 

Organization is an AI startup focused on harvesting and enriching data related to product defects for sale to corporations.

Project Description:

The aim of this assignment is the investigation into methods to interpret and explain highly complex data describing incidents of product failures, accessed from an evolving data lake. Proof of success will be a dashboard linked to the data lake demonstrating insights through interpretation of the data.

Research Objectives:

  • Investigation and discovery of correlations and relationships for product defects.
  • Investigation and discovery of optimized methods to interpretation and explanation of complex patterns across groups of incidents of product failure.
  • For the insights apparent from the interpretation, best methods for data presentation.


The overall (multi-position) objective is to create a data lake with a visualization dashboard that can support analysis:

  1. Find unique sources of data that relate to reviews of consumer products
  2. Analyze these sources of data for added value and clean
  3. Set up an online repository to collect these raw sources of data
  4. Implement ability to tap into external APIs (e.g. weather, traffic) for additional raw data
  5. Load raw sources of data into distributed database
  6. Attach a visualization tool that can support dashboard analysis
  7. Investigate any correlations using AI or statistical data science, e.g between location and weather at time of incident
  8. Implement an emulation of a mobile app that can inject glitch data into the data lake with automatic ability to lookup external data


  • Data might be biased for example not updating amazon reviews during a product life cycle
  • Need to investigate novel sources of data that can supplement primary sources - data may not be readily available
  • Sheer quantity of data available (that can be scraped) is enormous

Expertise and Skills Needed:

  • An understanding of data science fundamentals when it comes to business intelligence and visualizing data
  • Experience with data visualization tools like Tableau, QlikView, Power BI
  • Knowledge and/or experience with React a plus
  • Knowledge and/or experience with Python a plus

For more info or to apply to this applied research position, please

  1. Check your eligibility and find more information about open projects
  2. Interested students need to get the approval from their supervisor and send their CV along with a link to their supervisor’s university webpage by applying through the webform.