Automatic text processing
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
LogP: a practical model of parallel computation
Communications of the ACM
ACM Computing Surveys (CSUR)
Clustering Algorithms
Refining Initial Points for K-Means Clustering
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Efficient Disk-Based K-Means Clustering for Relational Databases
IEEE Transactions on Knowledge and Data Engineering
Text document clustering based on neighbors
Data & Knowledge Engineering
W-kmeans: clustering news articles using wordNet
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
A time-efficient pattern reduction algorithm for k-means clustering
Information Sciences: an International Journal
Distance-Based multiple paths quantization of vocabulary tree for object and scene retrieval
ACCV'09 Proceedings of the 9th Asian conference on Computer Vision - Volume Part I
Compression-aware I/O performance analysis for big data clustering
Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
A clustering technique for news articles using WordNet
Knowledge-Based Systems
Hi-index | 0.00 |
In this paper, we propose a new parallel clustering algorithm, named Parallel Bisecting k-means with Prediction (PBKP), for message-passing multiprocessor systems. Bisecting k-means tends to produce clusters of similar sizes, and according to our experiments, it produces clusters with smaller entropy (i.e., purer clusters) than k-means does. Our PBKP algorithm fully exploits the data-parallelism of the bisecting k-means algorithm, and adopts a prediction step to balance the workloads of multiple processors to achieve a high speedup. We implemented PBKP on a cluster of Linux workstations and analyzed its performance. Our experimental results show that the speedup of PBKP is linear with the number of processors and the number of data points. Moreover, PBKP scales up better than the parallel k-means with respect to the dimension and the desired number of clusters.