Implementing Agglomerative Hierarchic Clustering Algorithms for Use in Document Retrieval

Authors:
Ellen M. Voorhees
Affiliations:
-
Venue:
Implementing Agglomerative Hierarchic Clustering Algorithms for Use in Document Retrieval
Year:
1986

Citing 0
Cited 8

A cluster-based approach to thesaurus construction

SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
The automatic generation of extended queries

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments in automatic statistical thesaurus construction

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Modelling highly inflected languages

Information Sciences—Informatics and Computer Science: An International Journal
Re-ranking algorithm using post-retrieval clustering for content-based image retrieval

Information Processing and Management: an International Journal
Cross-language linking of news stories on the web using interlingual topic modelling

Proceedings of the 2nd ACM workshop on Social web search and mining
The effect of collection fusion strategies on information seeking performance in distributed hypermedia digital libraries

ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
Representations for multi-document event clustering

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Searching hierarchically clustered document collections can be effective, but creating the cluster hierarchies is expensive since there are both many documents and many terms. However, the information in the document-term matrix is sparse: documents are usually indexed by relatively few terms. This paper describes the implementations of three agglomerative hierarchic clustering algorithms that exploit this sparsity so that collections much larger than the algorithms'' worst case running times would suggest can be clustered. The implementations described in the paper have been used to cluster a collection of 12,000 documents.