Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Readings in information retrieval
Term-weighting approaches in automatic text retrieval
Readings in information retrieval
Using interdocument similarity information in document retrieval systems
Readings in information retrieval
Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Lightweight Document Matching for Help-Desk Applications
IEEE Intelligent Systems
Model Selection in Unsupervised Learning with Applications To Document Clustering
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Lightweight Collaborative Filtering Method for Binary-Encoded Data
PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
International Journal of Open Source Software and Processes
Hi-index | 0.00 |
A lightweight document clustering method is described that operates in high dimensions, processes tens of thousands of documents and groups them into several thousand clusters, or by varying a single parameter, into a few dozen clusters. The method uses a reduced indexing view of the original documents, where only the k best keywords of each document are indexed. An efficient procedure for clustering is speci fied in two parts (a) compute k most similar documents for each document in the collection and (b) group the documents into clusters using these similarity scores. The method has been evaluated on a database of over 50,000 customer service problem reports that are reduced to 3,000 clusters and 5,000 exemplar documents. Results demonstrate efficient clustering performance with excellent group similarity measures.