Fuzzy combinations of criteria: an application to web page representation for clustering
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Hi-index | 0.02 |
In order to conquer the major challenges of current web document clustering, i.e. huge volume of documents, high dimensional process, we proposed a simple agglomerative hierarchical K-Means clustering (SAHKC) algorithm based on H-K (hierarchical K-Means) algorithm, and a new model was used in this paper to describe the web document, named as multiple feature vector space model (MFVSM). Experimental results indicate that: the MFVSM is helpful in improving the quality of clustering result, and compare with the H-K algorithm, the SAHKC algorithm’s running time reduce nearly 30%, however, the average precision of clustering result only reduce about 10%.