Accelerating K-Means on the Graphics Processor via CUDA

Authors:
Mario Zechner;Michael Granitzer
Affiliations:
-;-
Venue:
INTENSIVE '09 Proceedings of the 2009 First International Conference on Intensive Applications and Services
Year:
2009

Citing 0
Cited 3

Efficient acquisition and clustering of local histograms for representing voxel neighborhoods

VG'10 Proceedings of the 8th IEEE/EG international conference on Volume Graphics
Parallel approaches to machine learning-A comprehensive survey

Journal of Parallel and Distributed Computing
K-means clustering algorithm for multimedia applications with flexible HW/SW co-design

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper an optimized k-means implementation on the graphics processing unit (GPU) is presented. NVIDIA’s Compute Unified Device Architecture (CUDA), available from the G80 GPU family onwards, is used as the programming environment. Emphasis is placed on optimizations directly targeted at this architecture to best exploit the computational capabilities available. Additionally drawbacks and limitations of previous related work, e.g. maximum instance, dimension and centroid count are addressed. The algorithm is realized in a hybrid manner, parallelizing distance calculations on the GPU while sequentially updating cluster centroids on the CPU based on the results from the GPU calculations. An empirical performance study on synthetic data is given, demonstrating a maximum 14x speed increase to a fully SIMD optimized CPU implementation.