Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
ROCK: a robust clustering algorithm for categorical attributes
Information Systems
Data mining: concepts and techniques
Data mining: concepts and techniques
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Selforganizing classification on the Reuters news corpus
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Self organization of a massive document collection
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
Text documents have sparse data spaces, and nearest neighbors may belong to different classes when using current existing proximity measures to describe the correlation of documents. In this paper, we propose an asymmetric similarity measure to strengthen the discriminative feature of document objects. We construct a semantic correlation network by asymmetric similarity between documents and conjecture the power law feature of the connections distributions. Hub points which exist in semantic correlation network are classified by an agglomerative hierarchical clustering approach named SCN. Both objects similarity and neighbors similarity are considered in the definition of hub points proximity. Finally, we assign the rest text objects to their nearest hub points. The experimental evaluation on textual data sets demonstrates the validity and efficiency of SCN. The comparison with other clustering algorithms shows the superiority of our approach.