Clustering billions of data points using GPUs

Authors:
Ren Wu;Bin Zhang;Meichun Hsu
Affiliations:
Hewlett Packard Company, Palo Alto, CA, USA;Hewlett Packard Company, Palo Alto, CA, USA;Hewlett Packard Company, Palo Alto, CA, USA
Venue:
Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop
Year:
2009

Citing 2
Cited 12

k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
A performance study of general-purpose applications on graphics processors using CUDA

Journal of Parallel and Distributed Computing

Multimedia Mining on Manycore Architectures: The Case for GPUs

ISVC '09 Proceedings of the 5th International Symposium on Advances in Visual Computing: Part II
GPU-accelerated predicate evaluation on column store

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Data-intensive document clustering on graphics processing unit (GPU) clusters

Journal of Parallel and Distributed Computing
A new method for GPU based irregular reductions and its application to k-means clustering

Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Dense affinity propagation on clusters of GPUs

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Efficient acquisition and clustering of local histograms for representing voxel neighborhoods

VG'10 Proceedings of the 8th IEEE/EG international conference on Volume Graphics
Speeding up k-Means algorithm by GPUs

Journal of Computer and System Sciences
Accelerating text mining workloads in a MapReduce-based distributed GPU environment

Journal of Parallel and Distributed Computing
Comparison based sorting for systems with multiple GPUs

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Parallel data mining techniques on Graphics Processing Unit with Compute Unified Device Architecture (CUDA)

The Journal of Supercomputing
Evaluating integrated graphics processors for data center workloads

Proceedings of the Workshop on Power-Aware Computing and Systems
GPUMAFIA: efficient subspace clustering with MAFIA on GPUs

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we report our research on using GPUs to accelerate clustering of very large data sets, which are common in today's real world applications. While many published works have shown that GPUs can be used to accelerate various general purpose applications with respectable performance gains, few attempts have been made to tackle very large problems. Our goal here is to investigate if GPUs can be useful accelerators even with very large data sets that cannot fit into GPU's onboard memory. Using a popular clustering algorithm, K-Means, as an example, our results have been very positive. On a data set with a billion data points, our GPU-accelerated implementation achieved an order of magnitude performance gain over a highly optimized CPU-only version running on 8 cores, and more than two orders of magnitude gain over a popular benchmark, MineBench, running on a single core.