Large-scale clustering and complete facet and tag calculation

Authors:
Bolette Ammitzbøll Madsen
Affiliations:
The Digital Resources and Web Group, The State and University Library of Denmark
Venue:
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Year:
2007

Citing 6
Cited 0

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical faceted metadata in site search interfaces

CHI '02 Extended Abstracts on Human Factors in Computing Systems
Probabilistic topic decomposition of an eighteenth-century American newspaper

Journal of the American Society for Information Science and Technology
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6

Quantified Score

Hi-index	0.00

Visualization

Abstract

The State and University Library of Denmark is developing an integrated search system called Summa, and as part of the Summa project a clustering module and a facet module. Simple clusters have been created for a collection of more than six and a half million library metadata records using a linear clustering algorithm. The created clusters are used to enrich the metadata records, and search results are presented to the user using a faceted browsing interface alongside a ranked result list. The most frequent tags in the different facets in the search result can be calculated and presented at a rate of approximately three million records per second per machine.