Text clustering based on granular computing and wikipedia

  • Authors:
  • Liping Jing;Jian Yu

  • Affiliations:
  • School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China;School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China

  • Venue:
  • RSKT'11 Proceedings of the 6th international conference on Rough sets and knowledge technology
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text clustering plays an important role in many real-world applications, but it is faced with various challenges, such as, curse of dimensionality, complex semantics and large volume. A lot of researches paid attention to deal with such problems by designing new text representation models and clustering algorithms. However, text clustering still remains a research problem due to the complicated properties of text data. In this paper, a text clustering procedure is proposed based on the principle of granular computing with the aid of Wikipedia. The proposed clustering method firstly identifies the text granules, especially focusing on concepts and words with the aid of Wikipedia. And then, it mines the latent patterns based on the computation of such granules. Experimental results on benchmark data sets (20Newsgroups and Reuters-21578) have shown that the proposed method improves the performance of text clustering by comparing with the existing clustering algorithm together with the existing representation models.