On the chinese document clustering based on dynamical term clustering

Authors:
Chih-Ming Tseng;Kun-Hsiu Tsai;Chiun-Chieh Hsu;His-Cheng Chang
Affiliations:
Department of Information Management, National Taiwan University of Science and Technology;Department of Information Management, National Taiwan University of Science and Technology;Department of Information Management, National Taiwan University of Science and Technology;Department of Information Management, National Taiwan University of Science and Technology
Venue:
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Year:
2005

Citing 5
Cited 0

Information Retrieval Systems: Theory and Implementation

Information Retrieval Systems: Theory and Implementation
Information Retrieval

Information Retrieval
Self-Organizing Maps

Self-Organizing Maps
Using a Hash-Based Method with Transaction Trimming for Mining Association Rules

IEEE Transactions on Knowledge and Data Engineering
Self organization of a massive document collection

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the rapid development of global networking through the network, more and more information is accessible on-line. It makes the document clustering technique more dispensable. With the clustering process we can efficiently browse the large information. In this paper, we focus on Chinese document clustering process, which uses data mining technique and neural network model. There are two main phases: preprocessing phase and clustering phase. In the preprocessing phase, we propose another Chinese sentence segmentation method, which based on data mining technique of using a hash-based method. In the clustering phase, we adopt the dynamical SOM model with a view to dynamically clustering data. Furthermore, we use term vectors clustering process instead of document vectors clustering process. Our experiments demonstrate that the term clustering results in better precision rate, and the term clustering will be more efficiently when the amount of documents grows gradually.