Intelligent Matching Algorithm Design and Implementation for Internet Big Data

In todays Big Data era, scientists and businesses owners strive to find accurate real time insights from a large size of various types of data moving at high speed, which has an effect on the human lifestyle and the enterprises productiveness. Advances in Internet and web technologies allow organizations to gather petabytes of structured or unstructured data from various types of sources on a daily bases, which enables them to derive tremendous insights about their customers, products and services. However, managing and processing the big data in a timely manner demands IT solutions with more agility, adaptability and high performance. Specifically, algorithms should be designed to match customers with the right information, at the right time, and with the right format representation given the customers preferences.
In this project we investigate three aspects of algorithm design and implementation in the context of matching desired information to customers in the Internet big data environment.
1) Design of customer preference model: To match the information which is relevant to the customers, it is imperative to model the customers preferences correctly. The model should accurately reflect the customers preferences on the data content, timing and format of presentation. In the meantime, the model should automatically adjust to customers changes of preferences. It should also allow customers to configure their model preferences.
2) Design of the information database and database updating mechanism: Information needed by the customers can be stored in the service providers database and/or retrieved from other databases and/or webpages on the Internet. To provide customers with relevant information in a timely manner, the database structure has to be agile enough to accommodate unstructured data and to provide the infrastructure for highly efficient data traverse on a large scale. We will design algorithms to efficiently compute the relevance ratio between data points and customers. In addition, the computation results should be updated on a regular basis to reflect the dynamic changes of customer preferences and the data itself.
3) Design of the matching algorithms: We will design algorithms to operate on the NoSQL databases to match information to customers. Due to the large scale of the data and the customer base, algorithmic complexities have to be dealt with in order to ensure the quality and responsiveness of the matching service. We will develop algorithms that operate on the graph representation of the data points. Efforts will be particularly channeled to the design techniques that greatly improve the accuracy and efficiency of the algorithm by exploiting the specific characteristics of the graph structure. Previously accumulated knowledge on graph theory, algorithm design, scheduling and optimization will by leveraged to develop such algorithms.

Faculty Supervisor:

Chun Wang

Student:

Nimisha Sharath

Partner:

Discipline:

Engineering - computer / electrical

Sector:

University:

Concordia University

Program:

Globalink