Large-scale clustering and complete facet and tag calculation

  • Authors:
  • Bolette Ammitzbøll Madsen

  • Affiliations:
  • The Digital Resources and Web Group, The State and University Library of Denmark

  • Venue:
  • ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The State and University Library of Denmark is developing an integrated search system called Summa, and as part of the Summa project a clustering module and a facet module. Simple clusters have been created for a collection of more than six and a half million library metadata records using a linear clustering algorithm. The created clusters are used to enrich the metadata records, and search results are presented to the user using a faceted browsing interface alongside a ranked result list. The most frequent tags in the different facets in the search result can be calculated and presented at a rate of approximately three million records per second per machine.