Keyword-based document clustering

Authors:
Seung-Shik Kang
Affiliations:
Kookmin University & AITrc, Chungnung-dong, Songbuk-gu, Seoul, Korea
Venue:
AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
Year:
2003

Citing 2
Cited 3

Recent trends in hierarchic document clustering: a critical review

Information Processing and Management: an International Journal
Dynamic cluster maintenance

Information Processing and Management: an International Journal

Concept Extraction and Clustering for Topic Digital Library Construction

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
A novel video summarization based on mining the story-structure and semantic relations among concept entities

IEEE Transactions on Multimedia - Special issue on integration of context and content
The optimum clustering framework: implementing the cluster hypothesis

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document clustering is an aggregation of related documents to a cluster based on the similarity evaluation task between documents and the representatives of clusters. Terms and their discriminating features of terms are the clue to the clustering and the discriminating features are based on the term and document frequencies. Feature selection method on the basis of frequency statistics has a limitation to the enhancement of the clustering algorithm because it does not consider the contents of the cluster objects. In this paper, we adopt a content-based analytic approach to refine the similarity computation and propose a keyword-based clustering algorithm. Experimental results show that content-based keyword weighting outperforms frequency-based weighting method.