Document Clustering Using Semantic Kernels Based on Term-Term Correlations

  • Authors:
  • Ahmed K. Farahat;Mohamed S. Kamel

  • Affiliations:
  • -;-

  • Venue:
  • ICDMW '09 Proceedings of the 2009 IEEE International Conference on Data Mining Workshops
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Document clustering algorithms usually use vector space model (VSM) as their underlying model for document representation. VSM assumes that terms are independent and accordingly ignores any semantic relations between them. This results in mapping documents to a space where the proximity between document vectors does not reflect their true semantic similarity. In this paper, we propose the use of semantic kernels that are based on term-term correlations for improving the effectiveness of document clustering algorithms. The used kernels measure proximity between documents based on how their terms are statistically correlated. We analyze semantic kernels that capture different aspects of correlations between terms, and evaluate them by conducting experiments on different benchmark data sets. Results show that the proposed method achieves significant improvement in document clustering compared to VSM.