An Improved Hierarchical K-Means Algorithm for Web Document Clustering

  • Authors:
  • Yongxin Liu;Zhijng Liu

  • Affiliations:
  • -;-

  • Venue:
  • ICCSIT '08 Proceedings of the 2008 International Conference on Computer Science and Information Technology
  • Year:
  • 2008

Quantified Score

Hi-index 0.02

Visualization

Abstract

In order to conquer the major challenges of current web document clustering, i.e. huge volume of documents, high dimensional process, we proposed a simple agglomerative hierarchical K-Means clustering (SAHKC) algorithm based on H-K (hierarchical K-Means) algorithm, and a new model was used in this paper to describe the web document, named as multiple feature vector space model (MFVSM). Experimental results indicate that: the MFVSM is helpful in improving the quality of clustering result, and compare with the H-K algorithm, the SAHKC algorithm’s running time reduce nearly 30%, however, the average precision of clustering result only reduce about 10%.