A highly parallel implementation of k-means for multithreaded architecture

  • Authors:
  • Patrick Mackey;John Feo;Pak Chung Wong;Yousu Chen

  • Affiliations:
  • Researcher at Pacific Northwest National Laboratory;Chief Scientist at Pacific Northwest National Laboratory;Chief Scientist at Pacific Northwest National Laboratory;Senior Research Engineer at Pacific Northwest National Laboratory

  • Venue:
  • Proceedings of the 19th High Performance Computing Symposia
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a parallel implementation of the popular k-means clustering algorithm for massively multithreaded computer systems, as well as a parallelized version of the KKZ seed selection algorithm. We demonstrate that as system size increases, sequential seed selection can become a bottleneck. We also present an early attempt at parallelizing k-means that highlights critical performance issues when programming massively multithreaded systems. For our case studies, we used data collected from electric power simulations and run on the Cray XMT.