Unsupervised Machine Learning Approaches to Define the Underlying Structures in Data

In many real-world machine learning problems, inadequate labeled data and extensive unlabeled data are usually available. Normally, unlabeled data is collected routinely, though the high cost associated with labeling the data is high for generating the labeled training dataset for supervised learning. Moreover, the lack of domain experts and the time-consuming data labeling, especially in labs, make this process more complex. In such cases, unsupervised machine learning approaches are used to find the underlying patterns in data. In this project, we deal with raw unlabeled data generated by fiber sensors. We will clean the data in several steps and investigate the potential underlying structures in data through various unsupervised machine learning approaches such as data clustering and dimensionality reduction algorithms. Through model validation and evaluation, we will identify the fallacies that may contribute to data anomalies. Models with low error rates can be considered as trusted models for evaluating the upcoming unlabeled data. The ability to determine the underlying structures in data enables our partner to measure the consistency of the given data. A reliable consistency feedback model helps creating uniform products in the fiber industry and other industries that deal with unlabeled data.

Faculty Supervisor:

Saeed Samet

Student:

Partner:

Instrumar

Discipline:

Computer science

Sector:

Manufacturing

University:

University of Windsor

Program: