A translation system for enabling data mining applications on GPUs

Authors:
Wenjing Ma;Gagan Agrawal
Affiliations:
The Ohio State University, Columbus, OH, USA;The Ohio State University, Columbus, OH, USA
Venue:
Proceedings of the 23rd international conference on Supercomputing
Year:
2009

Citing 30
Cited 10

Algorithms for clustering data

Algorithms for clustering data
Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
Static analysis to reduce synchronization costs in data-parallel programs

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Using integer sets for data-parallel program analysis and optimization

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
The role of associativity and commutativity in the detection and transformation of loop-level parallelism

ICS '98 Proceedings of the 12th international conference on Supercomputing
Adaptive reduction parallelization techniques

Proceedings of the 14th international conference on Supercomputing
SQLEM: fast clustering in SQL using the EM algorithm

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey

Data Mining and Knowledge Discovery
The Paradigm Compiler for Distributed-Memory Multicomputers

Computer
Parallel Programming with Polaris

Computer
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Compiling Global Name-Space Parallel Loops for Distributed Execution

IEEE Transactions on Parallel and Distributed Systems
Photon mapping on programmable graphics hardware

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Compiler Optimization of Implicit Reductions for Distributed Memory Multiprocessors

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
GPU Cluster for High Performance Computing

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Exploring Graphics Processor Performance for General Purpose Applications

DSD '05 Proceedings of the 8th Euromicro Conference on Digital System Design
GPUTeraSort: high performance graphics co-processor sorting for large database management

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Accelerator: using data parallelism to program GPUs for general-purpose uses

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A performance-oriented data parallel virtual machine for GPUs

ACM SIGGRAPH 2006 Sketches
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
A compiler framework for optimization of affine loop nests for gpgpus

Proceedings of the 22nd annual international conference on Supercomputing
Mars: a MapReduce framework on graphics processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallelizing user-defined and implicit reductions globally on multiprocessors

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Initial experiences porting a bioinformatics application to a graphics processor

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
A graphics hardware accelerated algorithm for nearest neighbor search

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV

Accelerating SQL database operations on a GPU with CUDA

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations

Proceedings of the 24th ACM International Conference on Supercomputing
An integer programming framework for optimizing shared memory use on GPUs

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
GRace: a low-overhead mechanism for detecting data races in GPU programs

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Practical loop transformations for tensor contraction expressions on multi-level memory hierarchies

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
An execution strategy and optimized runtime support for parallelizing irregular reductions on modern GPUs

Proceedings of the international conference on Supercomputing
A GPU-based high-throughput image retrieval algorithm

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Dense affinity propagation on clusters of GPUs

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
HC-CART: A parallel system implementation of data mining classification and regression tree (CART) algorithm on a multi-FPGA system

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Optimizing tensor contraction expressions for hybrid CPU-GPU execution

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern GPUs offer much computing power at a very modest cost. Even though CUDA and other related recent developments are accelerating the use of GPUs for general purpose applications, several challenges still remain in programming the GPUs. Thus, it is clearly desirable to be able to program GPUs using a higher-level interface. In this paper, we offer a solution that targets a specific class of applications, which are the data mining and scientific data analysis applications. Our work is driven by the observation that a common processing structure, that of generalized reductions, fits a large number of popular data mining algorithms. In our solution, the programmers simply need to specify the sequential reduction loop(s) with some additional information about the parameters. We use program analysis and code generation to map the applications to a GPU. Several additional optimizations are also performed by the system. We have evaluated our system using three popular data mining applications, k-means clustering, EM clustering, and Principal Component Analysis (PCA). The main observations from our experiments are as follows. The speedup that each of these applications achieve over a sequential CPU version ranges between 20 and 50. The automatically generated version did not have any noticeable overheads compared to hand written codes. Finally, the optimizations performed in the system resulted in significant performance improvements.