On the chinese document clustering based on dynamical term clustering

  • Authors:
  • Chih-Ming Tseng;Kun-Hsiu Tsai;Chiun-Chieh Hsu;His-Cheng Chang

  • Affiliations:
  • Department of Information Management, National Taiwan University of Science and Technology;Department of Information Management, National Taiwan University of Science and Technology;Department of Information Management, National Taiwan University of Science and Technology;Department of Information Management, National Taiwan University of Science and Technology

  • Venue:
  • AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the rapid development of global networking through the network, more and more information is accessible on-line. It makes the document clustering technique more dispensable. With the clustering process we can efficiently browse the large information. In this paper, we focus on Chinese document clustering process, which uses data mining technique and neural network model. There are two main phases: preprocessing phase and clustering phase. In the preprocessing phase, we propose another Chinese sentence segmentation method, which based on data mining technique of using a hash-based method. In the clustering phase, we adopt the dynamical SOM model with a view to dynamically clustering data. Furthermore, we use term vectors clustering process instead of document vectors clustering process. Our experiments demonstrate that the term clustering results in better precision rate, and the term clustering will be more efficiently when the amount of documents grows gradually.