Parallel bisecting k-means with prediction clustering algorithm

Authors:
Yanjun Li;Soon M. Chung
Affiliations:
Department of Computer Science and Engineering, Wright State University, Dayton, USA 45435;Department of Computer Science and Engineering, Wright State University, Dayton, USA 45435
Venue:
The Journal of Supercomputing
Year:
2007

Citing 8
Cited 7

Automatic text processing

Automatic text processing
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
LogP: a practical model of parallel computation

Communications of the ACM
Data clustering: a review

ACM Computing Surveys (CSUR)
Clustering Algorithms

Clustering Algorithms
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
Efficient Disk-Based K-Means Clustering for Relational Databases

IEEE Transactions on Knowledge and Data Engineering

Text document clustering based on neighbors

Data & Knowledge Engineering
W-kmeans: clustering news articles using wordNet

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
A time-efficient pattern reduction algorithm for k-means clustering

Information Sciences: an International Journal
Efficient parallel algorithm for pixel classification in remote sensing imagery

Geoinformatica
Distance-Based multiple paths quantization of vocabulary tree for object and scene retrieval

ACCV'09 Proceedings of the 9th Asian conference on Computer Vision - Volume Part I
Compression-aware I/O performance analysis for big data clustering

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
A clustering technique for news articles using WordNet

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a new parallel clustering algorithm, named Parallel Bisecting k-means with Prediction (PBKP), for message-passing multiprocessor systems. Bisecting k-means tends to produce clusters of similar sizes, and according to our experiments, it produces clusters with smaller entropy (i.e., purer clusters) than k-means does. Our PBKP algorithm fully exploits the data-parallelism of the bisecting k-means algorithm, and adopts a prediction step to balance the workloads of multiple processors to achieve a high speedup. We implemented PBKP on a cluster of Linux workstations and analyzed its performance. Our experimental results show that the speedup of PBKP is linear with the number of processors and the number of data points. Moreover, PBKP scales up better than the parallel k-means with respect to the dimension and the desired number of clusters.