Algorithms for clustering data
Algorithms for clustering data
Large-Scale Parallel Data Clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Data-Clustering Algorithm on Distributed Memory Multiprocessors
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Large-Scale Parallel Data Clustering
ICPR '96 Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume IV-Volume 7472 - Volume 7472
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Accurate integration of multi-view range images using k-means clustering
Pattern Recognition
K-means Clustering for Multispectral Images Using Floating-Point Divide
FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Top 10 algorithms in data mining
Knowledge and Information Systems
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A performance study of general-purpose applications on graphics processors using CUDA
Journal of Parallel and Distributed Computing
Clustering billions of data points using GPUs
Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop
Practical Random Linear Network Coding on GPUs
NETWORKING '09 Proceedings of the 8th International IFIP-TC 6 Networking Conference
A measurement study of GPU DVFS on energy conservation
Proceedings of the Workshop on Power-Aware Computing and Systems
Journal of Computer and System Sciences
Hi-index | 0.00 |
Cluster analysis plays a critical role in a wide variety of applications; but it is now facing the computational challenge due to the continuously increasing data volume. Parallel computing is one of the most promising solutions to overcoming the computational challenge. In this paper, we target at parallelizing k-Means, which is one of the most popular clustering algorithms, by using the widely available Graphics Processing Units (GPUs). Different from existing GPU-based k-Means algorithms, we observe that data dimensionality is an important factor that should be taken into consideration when parallelizing k-Means on GPUs. In particular, we use two different strategies for low-dimensional data sets and high-dimensional data sets respectively, in order to make the best use of GPU computing horsepower. For low-dimensional data sets, we design an algorithm that exploits GPU on-chip registers to significantly decrease the data access latency. For high-dimensional data sets, we design another novel algorithm that simulates matrix multiplication and exploits GPU on-chip shared memory to achieve high compute-to-memory-access ratio. Our experimental results show that our GPU-based k-Means algorithms are three to eight times faster than the best reported GPU-based algorithms.