Recent trends in hierarchic document clustering: a critical review
Information Processing and Management: an International Journal
Readings in information retrieval
Readings in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Semantic indexing for a complete subject discipline
Proceedings of the fourth ACM conference on Digital libraries
ACM Computing Surveys (CSUR)
Document clustering using word clusters via the information bottleneck method
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
K-clustering in wireless ad hoc networks
Proceedings of the second ACM international workshop on Principles of mobile computing
Selection, tinkering, and emergence in complex networks
Complexity - Special issue: Selection, tinkering, and emergence in complex networks
Word association norms, mutual information, and lexicography
ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Mining scale-free networks using geodesic clustering
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Improving quality of search results clustering with approximate matrix factorisations
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Clustering of document collection - A weighting approach
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Words in natural language documents exhibit a small world network structure. Thus the physics community provides us with an extensive supply of algorithms for extracting community structure. We present a novel method for semantically clustering a large collection of documents using small world communities. This method combines modified physics algorithms with traditional information retrieval techniques. A term network is generated from the document collection, the terms are clustered into small world communities, the semantic term clusters are used to generate overlapping document clusters. The algorithm combines the speed of single link with the quality of complete link. Clustering takes place in nearly real-time and the results are judged to be coherent by expert users. Our algorithm occupies a middle ground between speed and quality of document clustering.