Semantic correlation network based text clustering

Authors:
Shaoxu Song;Chunping Li
Affiliations:
School of Software, Tsinghua University, Beijing, China;School of Software, Tsinghua University, Beijing, China
Venue:
AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Year:
2005

Citing 8
Cited 0

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
Data mining: concepts and techniques

Data mining: concepts and techniques
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Selforganizing classification on the Reuters news corpus

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Self organization of a massive document collection

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text documents have sparse data spaces, and nearest neighbors may belong to different classes when using current existing proximity measures to describe the correlation of documents. In this paper, we propose an asymmetric similarity measure to strengthen the discriminative feature of document objects. We construct a semantic correlation network by asymmetric similarity between documents and conjecture the power law feature of the connections distributions. Hub points which exist in semantic correlation network are classified by an agglomerative hierarchical clustering approach named SCN. Both objects similarity and neighbors similarity are considered in the definition of hub points proximity. Finally, we assign the rest text objects to their nearest hub points. The experimental evaluation on textual data sets demonstrates the validity and efficiency of SCN. The comparison with other clustering algorithms shows the superiority of our approach.