Iterative statistical kernels on contemporary GPUs

Authors:
Thilina Gunarathne;Bimalee Salpitikorala;Arun Chauhan;Geoffrey Fox
Affiliations:
School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA;School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA;School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA;School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA
Venue:
International Journal of Computational Science and Engineering
Year:
2013

Citing 17
Cited 0

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Efficient K-Means Clustering Using Accelerated Graphics Processors

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Glimmer: Multilevel MDS on the GPU

IEEE Transactions on Visualization and Computer Graphics
OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
K-Means on Commodity GPUs with CUDA

CSIE '09 Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 03
Dimension reduction and visualization of large high-dimensional data via interpolation

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Twister: a runtime for iterative MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
OpenMPC: Extended OpenMP Programming and Tuning for GPUs

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Speeding up K-Means Algorithm by GPUs

CIT '10 Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology
Efficient PageRank and SpMV Computation on AMD GPUs

ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
Fast sparse matrix-vector multiplication on GPUs: implications for graph mining

Proceedings of the VLDB Endowment
A new method for GPU based irregular reductions and its application to k-means clustering

Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
CAMPAIGN

Bioinformatics
Portable Parallel Programming on Cloud and HPC: Scientific Applications of Twister4Azure

UCC '11 Proceedings of the 2011 Fourth IEEE International Conference on Utility and Cloud Computing
Automatic C-to-CUDA code generation for affine programs

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Least squares quantization in PCM

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a study of OpenCL implementations of three important kernels that occur frequently in iterative statistical applications: multi-dimensional scaling MDS, PageRank and K-means clustering. We evaluated their performance on NVIDIA Tesla and Fermi GPGPU cards using dedicated hardware, and in the case of Fermi, also on the Amazon EC2 cloud-computing environment. We explored the optimisation of these kernels by four main techniques: 1 caching invariant data in GPU memory across iterations; 2 selectively placing data in different memory levels; 3 rearranging data in memory; 4 dividing the work between the GPU and the CPU. We also implemented a novel algorithm for MDS and a novel data layout scheme for PageRank. Our optimisations resulted in performance improvements of up to 5× to 6×, compared to naïve OpenCL implementations and up to 100× improvement over single-core CPU. We believe that these categories of optimisations are also applicable to other similar kernels.