Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Weighted kernel model for text categorization
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Utilizing phrase-similarity measures for detecting and clustering informative RSS news articles
Integrated Computer-Aided Engineering
Searching Correlated Objects in a Long Sequence
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Generating Fuzzy Equivalence Classes on RSS News Articles for Retrieving Correlated Information
ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Object relevance weight pattern mining for activity recognition and segmentation
Pervasive and Mobile Computing
Clustering zebrafish genes based on frequent-itemsets and frequency levels
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Hierarchical document clustering using local patterns
Data Mining and Knowledge Discovery
Web image clustering with reduced keywords and weighted bipartite spectral graph partitioning
PCM'06 Proceedings of the 7th Pacific Rim conference on Advances in Multimedia Information Processing
Hi-index | 0.00 |
In this paper, we propose a new text clustering algorithm, named Clustering based on Frequent Word Sequences (CFWS). A word sequence is frequent if it occurs in more than certain percentage of the documents in the text database. In the past, the vector space model was commonly used for information retrieval, but it treats documents as bags of words, ignoring the sequential pattern of word occurrences in the documents. However, the meaning of natural languages strongly depends on the word sequences, and the frequent word sequences can provide compact and valuable information about the text database. Bisecting k-means and FIHC algorithms are evaluated on the performance of text clustering, and are compared with the proposed CFWS algorithm. It has been shown that CFWS has much better performance.