Compiler and runtime support for shared memory parallelization of data mining algorithms

Authors:
Xiaogang Li;Ruoming Jin;Gagan Agrawal
Affiliations:
Department of Computer and Information Sciences, Ohio State University, Columbus, OH;Department of Computer and Information Sciences, Ohio State University, Columbus, OH;Department of Computer and Information Sciences, Ohio State University, Columbus, OH
Venue:
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Year:
2002

Citing 15
Cited 1

Algorithms for clustering data

Algorithms for clustering data
Run-Time Parallelization and Scheduling of Loops

IEEE Transactions on Computers
Compiler and software distributed shared memory support for irregular applications

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Eliminating synchronization bottlenecks in object-based programs using adaptive replication

ICS '99 Proceedings of the 13th international conference on Supercomputing
A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors

Proceedings of the 14th international conference on Supercomputing
Data mining: concepts and techniques

Data mining: concepts and techniques
Parallel data mining for association rules on shared-memory multi-processors

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Parallel data mining for association rules on shared memory systems

Knowledge and Information Systems
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey

Data Mining and Knowledge Discovery
Parallel and Distributed Association Mining: A Survey

IEEE Concurrency
Parallel Programming with Polaris

Computer
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
On the Automatic Parallelization of Sparse and Irregular Fortran Programs

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Parallel Classification for Data Mining on Shared-Memory Multiprocessors

ICDE '99 Proceedings of the 15th International Conference on Data Engineering

Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data mining techniques focus on finding novel and useful patterns or models from large datasets. Because of the volume of the data to be analyzed, the amount of computation involved, and the need for rapid or even interactive analysis, data mining applications require the use of parallel machines. We have been developing compiler and runtime support for developing scalable implementations of data mining algorithms. Our work encompasses shared memory parallelization, distributed memory parallelization, and optimizations for processing disk-resident datasets. In this paper, we focus on compiler and runtime support for shared memory parallelization of data mining algorithms. We have developed a set of parallelization techniques that apply across algorithms for a variety of mining tasks. We describe the interface of the middleware where these techniques are implemented. Then, we present compiler techniques for translating data parallel code to the middleware specification. Finally, we present a brief evaluation of our compiler using apriori association mining and k-means clustering.