Recent trends in hierarchic document clustering: a critical review
Information Processing and Management: an International Journal
Information Processing and Management: an International Journal
Concept Extraction and Clustering for Topic Digital Library Construction
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
IEEE Transactions on Multimedia - Special issue on integration of context and content
The optimum clustering framework: implementing the cluster hypothesis
Information Retrieval
Hi-index | 0.00 |
Document clustering is an aggregation of related documents to a cluster based on the similarity evaluation task between documents and the representatives of clusters. Terms and their discriminating features of terms are the clue to the clustering and the discriminating features are based on the term and document frequencies. Feature selection method on the basis of frequency statistics has a limitation to the enhancement of the clustering algorithm because it does not consider the contents of the cluster objects. In this paper, we adopt a content-based analytic approach to refine the similarity computation and propose a keyword-based clustering algorithm. Experimental results show that content-based keyword weighting outperforms frequency-based weighting method.