An Efficient k-Means Algorithm on CUDA

  • Authors:
  • Jiadong Wu;Bo Hong

  • Affiliations:
  • -;-

  • Venue:
  • IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The $k$-means algorithm is widely used for unsupervised clustering. This paper describes an efficient CUDA-based $k$-means algorithm. Different from existing GPU-based k-means algorithms, our algorithm achieves better efficiency by utilizing the triangle inequality. Our algorithm explores the trade-off between load balance and memory access coalescing through data layout management. Because the effectiveness of the triangle inequity depends on the input data, we further propose a hybrid algorithm that adaptively determines whether to apply the triangle inequality. The efficiency of our algorithm is validated through extensive experiments, which demonstrate improved performance over existing CPU-based and CUDA-based k-means algorithms, in terms of both speed and scalability.