Text Clustering Algorithm Based on Lexical Graph

  • Authors:
  • Yun Sha;Guoying Zhang;Huina Jiang

  • Affiliations:
  • Beijing Institute of Petrochemical Technology, China;Beijing Institute of Petrochemical Technology, China;Beijing Institute of Petrochemical Technology, China

  • Venue:
  • FSKD '07 Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 02
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text clustering methods can group text into thematic clusters, which is an important topic in many fields, such as search engine. The well-known methods of text clustering, however, do not really address the special problems of text clustering because of the very high dimensionality data and understandability of the cluster description. An algorithm for text clustering based on lexical graph is proposed in this paper, which is a kind of term-based cluster method. The lexical graph is build with nodes representing words and edges representing their concurrent in text. The attribute of each node is text which the word occurs in. A cluster center is defined as node (word) with large degree in this graph, the center attributes (text occurs in) and its neighbors' are partitioned to one cluster whose description is the center node. This approach reduces drastically the dimensionality of the data and improves the synonymy extension ability. An experimental evaluation on web documents as well as classical text documents on demonstrates that the proposed algorithms obtain clustering of comparable quality significantly more efficiently than K-Means and STC algorithms on the search results data set. Furthermore, this method provides an understandable description of the discovered clusters by their center.