K-Means for Parallel Architectures Using All-Prefix-Sum Sorting and Updating Steps

Authors:
Kai J. Kohlhoff;Vijay S. Pande;Russ B. Altman
Affiliations:
Stanford University, Stanford;Stanford University, Stanford;Stanford University, Stanford
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2013

Citing 0
Cited 1

In-place transposition of rectangular matrices on accelerators

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an implementation of parallel $(K)$-means clustering, called $(K_{ps})$-means, that achieves high performance with near-full occupancy compute kernels without imposing limits on the number of dimensions and data points permitted as input, thus combining flexibility with high degrees of parallelism and efficiency. As a key element to performance improvement, we introduce parallel sorting as data preprocessing and updating steps. Our final implementation for Nvidia GPUs achieves speedups of up to 200-fold over CPU reference code and of up to three orders of magnitude when compared with popular numerical software packages.