Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
Parallel algorithms for hierarchical clustering
Parallel Computing
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Matrix computations (3rd ed.)
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
ACM Computing Surveys (CSUR)
Document Categorization and Query Generation on the World Wide WebUsing WebACE
Artificial Intelligence Review - Special issue on data mining on the Internet
Concept decompositions for large sparse text data using clustering
Machine Learning
Regular Article: A Structured Family of Clustering and Tree Construction Methods
Advances in Applied Mathematics
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Principal Direction Divisive Partitioning
Data Mining and Knowledge Discovery
Iterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems
Enhancing clustering blog documents by utilizing author/reader comments
ACM-SE 45 Proceedings of the 45th annual southeast regional conference
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Enhanced bisecting k-means clustering using intermediate cooperation
Pattern Recognition
WisColl: Collective wisdom based blog clustering
Information Sciences: an International Journal
Pattern Recognition
Efficient stochastic algorithms for document clustering
Information Sciences: an International Journal
Hi-index | 0.00 |
Clustering web document is an important procedure in many web information retrieval systems. As the size of the Internet grows rapidly and the amount of information requests increases exponentially, the use of parallel computing techniques in large scale web document retrieval is unavoidable. We propose a parallel hybrid web document clustering algorithm, which combines the Principal Direction Divisive Partitioning (PDDP) algorithm with the K-means algorithm. Computational experiments were conducted to test the performance of the hybrid algorithm using three real life web document datasets, and the results were compared with that of the parallel PDDP algorithm and the parallel K-means algorithm. The experiments show that the quality of the clustering solutions obtained from the hybrid algorithm is better than that from the parallel PDDP or the parallel K-means. The parallel run time of the hybrid algorithm is similar to and sometimes less than that of the widely used K-means algorithm.