A new text clustering method using hidden Markov model

Authors:
Yan Fu;Dongqing Yang;Shiwei Tang;Tengjiao Wang;Aiqiang Gao
Affiliations:
School of Electronics Engineering and Computer Science, Peking University, Beijing, China;School of Electronics Engineering and Computer Science, Peking University, Beijing, China;National Laboratory on Machine Perception, Peking University, Beijing, China;School of Electronics Engineering and Computer Science, Peking University, Beijing, China;School of Electronics Engineering and Computer Science, Peking University, Beijing, China
Venue:
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Year:
2007

Citing 8
Cited 0

Algorithms for clustering data

Algorithms for clustering data
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Optimization of inverted vector searches

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Data mining: concepts and techniques

Data mining: concepts and techniques
Information Retrieval

Information Retrieval
A Hidden Markov Model-Based Approach to Sequential Data Clustering

Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Being high-dimensional and relevant in semantics, text clustering is still an important topic in data mining. However, little work has been done to investigate attributes of clustering process, and previous studies just focused on characteristics of text itself. As a dynamic and sequential process, we aim to describe text clustering as state transitions for words or documents. Taking K-means clustering method as example, we try to parse the clustering process into several sequences. Based on research of sequential and temporal data clustering, we propose a new text clustering method using HMM(Hidden Markov Model). And through the experiments on Reuters-21578, the results show that this approach provides an accurate clustering partition, and achieves better performance rates compared with K-means algorithm.