Vector quantization and signal compression
Vector quantization and signal compression
Advances in knowledge discovery and data mining
Advances in knowledge discovery and data mining
Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator
ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special issue on uniform random number generation
Efficient histogram generation using scattering on GPUs
Proceedings of the 2007 symposium on Interactive 3D graphics and games
Modified global k-means algorithm for clustering in gene expression data sets
WISB '06 Proceedings of the 2006 workshop on Intelligent systems for bioinformatics - Volume 73
A performance study of general-purpose applications on graphics processors using CUDA
Journal of Parallel and Distributed Computing
Intel threading building blocks
Intel threading building blocks
Clustering billions of data points using GPUs
Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop
K-Means on Commodity GPUs with CUDA
CSIE '09 Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 03
Proceedings of the 24th ACM International Conference on Supercomputing
Nested data-parallelism on the gpu
Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Iterative statistical kernels on contemporary GPUs
International Journal of Computational Science and Engineering
Hi-index | 0.00 |
A frequently used method of clustering is a technique called k-means clustering. The k-means algorithm consists of two steps: A map step, which is simple to execute on a GPU, and a reduce step, which is more problematic. Previous researchers have used a hybrid approach in which the map step is computed on the GPU and the reduce step is performed on the CPU. In this work, we present a new algorithm for irregular reductions and apply it to k-means such that the GPU executes both the map and reduce steps. We provide experimental comparisons using OpenCL. Our results show that our scheme is 3.2 times faster than the hybrid scheme for k = 10, an average 1.5 times faster when the number of clusters, k = 100 and on average equal for k = 400, on an ATI Radeon® HD 5870 (best speedup was 3.5 times) compared to the hybrid approach. In addition, we compare the GPU code with the standard OpenMP benchmark, MineBench. In that implementation, both the map and reduce steps are computed on the CPU. For large data sizes, the new GPU scheme shows great promise, with performance up to 35 times faster than MineBench on a four core Intel i7 CPU.