Indexing and searching tera-scale Grid-Based Digital Libraries

Authors:
Robert Sanderson;Ray R. Larson
Affiliations:
University of Liverpool, Liverpool, U.K.;University of California, Berkeley, California
Venue:
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Year:
2006

Citing 5
Cited 5

Cheshire II: designing a next-generation online catalog

Journal of the American Society for Information Science - Special issue: current research in online public access systems
The Grid 2: Blueprint for a New Computing Infrastructure

The Grid 2: Blueprint for a New Computing Infrastructure
Grid-based digital libraries: cheshire3 and distributed retrieval

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Scientific workflow management and the Kepler system: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
A no-compromises architecture for digital document preservation

ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries

Cheshire3: retrieving from tera-scale grid-based digital libraries

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Integrating data and text mining processes for digital library applications

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Making web annotations persistent over time

Proceedings of the 10th annual joint conference on Digital libraries
Knowledge generation from digital libraries and persistent archives

ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
Global web archive integration with memento

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

The University of California, Berkeley and the University of Liverpool in conjunction with the San Diego Supercomputer Center are developing a framework for Grid-Based Digital Library systems and Information Retrieval Services (Cheshire3) that operates in both single-processor and distributed computing environments. In this paper we discuss some results of testing Grid-based parallel approaches in indexing and retrieval for a variety of information resources, ranging from small test collections like the TREC and INEX collections, to medium-scale metadata collections like Medline and a test version of University of California Online Union Catalog, MELVYL (with 15 million and 16.5 million records respectively) ranging up to large-scale collections like the US National Records and Archives Administration (NARA) Preservation Prototype. This paper examines our approaches to indexing and retrieving from these collections and the architecture of the system that supports them.