Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Document clustering using word clusters via the information bottleneck method
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Wikify!: linking documents to encyclopedic knowledge
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Building semantic kernels for text classification using wikipedia
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Improving Text Classification by Using Encyclopedia Knowledge
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Clustering Documents with Active Learning Using Wikipedia
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Supervised and Traditional Term Weighting Methods for Automatic Text Categorization
IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering Documents Using a Wikipedia-Based Concept Representation
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Exploiting Wikipedia as external knowledge for document clustering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Feature weighting plays an important role in text clustering. Traditional feature weighting is determined by the syntactic relationship between feature and document (e.g. TF-IDF). In this paper, a semantically enriched feature weighting approach is proposed by introducing the semantic relationship between feature and document, which is implemented by taking account of the local feature relatedness -- the relatedness between feature and its contextual features within each individual document. Feature relatedness is measured by two methods, document collection-based implicit relatedness measure and Wikipedia link-based explicit relatedness measure. Experimental results on benchmark data sets show that the new feature weighting approach surpasses traditional syntactic feature weighting. Moreover, clustering quality can be further improved by linearly combining the syntactic and semantic factors. The new feature weighting approach is also compared with two existing feature relatedness-based approaches which consider the global feature relatedness (feature relatedness in the entire feature space) and the interdocument feature relatedness (feature relatedness between different documents) respectively. In the experiments, the new feature weighting approach outperforms these two related work in clustering quality and costs much less computational complexity.