Parallel Software Acceleration of Meta-Heuristics for Data Mining

Over the last ten years, the needs of industry have made data mining one of the most important facets of Information Technology (IT). In simple terms, data mining is the automatic process of extracting interrelationships and patterns of interest from data. Today, companies around the world rely on data mining not only to discover knowledge from historical information, but to predict future  trends and behaviors, allowing businesses to make more informed decisions.

The goal of this project is two-fold: (1) To construct scalable meta-heuristics by exploiting the parallelism that is available in today’s commodity hardware; and (2) use these scalable, parallel meta-heuristics as part of an accelerator for data mining. With regards to (1) we propose to consider only populationbased metaheuristics, as these lend themselves most naturally to parallel  implementations. The metaheuristics that will be considered include genetic algorithms, greedy randomized adaptive search
procedures, estimation of distribution algorithms, particle swarm optimization, and ant colony algorithms. When seeking to parallelize these algorithms we plan to only exploit commodity hardware in the form of multi-core and many-core architectures. The decision to limit ourselves to commodity hardware makes our algorithms immediately accessible to anyone with a desktop PC with a Graphics Processing Unit (GPU). Currently, the latter contains an enormous amount of potential computing power with (simple) GPU cores outnumbering (complex) CPU cores by two orders-of-magnitude. By implementing the meta-heuristics directly on the GPU or a combination of CPU-GPU, we hope to achieve a speedup of at least one order-of-magnitude over software. Once complete, the metaheuristics from (1) will be used to develop a scalable, parallel accelerator for data mining. Our shortterm (12-week) goal is to use these parallel metaheuristics to construct scalable classifiers.

The proposed project research program consists of theoretical analysis supplemented with experimental prototyping. Thus, the student will not only gain knowledge in formal thinking, but also practical skills in developing complex parallel software systems.

Intern: 
Abhishek Agrawal
Faculty Supervisor: 
Dr. Gary Grewal
Province: 
Ontario
Discipline: