A cluster-based approach to thesaurus construction
SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
The automatic generation of extended queries
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments in automatic statistical thesaurus construction
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Modelling highly inflected languages
Information Sciences—Informatics and Computer Science: An International Journal
Re-ranking algorithm using post-retrieval clustering for content-based image retrieval
Information Processing and Management: an International Journal
Cross-language linking of news stories on the web using interlingual topic modelling
Proceedings of the 2nd ACM workshop on Social web search and mining
ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
Representations for multi-document event clustering
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Searching hierarchically clustered document collections can be effective, but creating the cluster hierarchies is expensive since there are both many documents and many terms. However, the information in the document-term matrix is sparse: documents are usually indexed by relatively few terms. This paper describes the implementations of three agglomerative hierarchic clustering algorithms that exploit this sparsity so that collections much larger than the algorithms'' worst case running times would suggest can be clustered. The implementations described in the paper have been used to cluster a collection of 12,000 documents.