The open archives initiative: building a low-barrier interoperability framework
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Arc: an OAI service provider for cross-archive searching
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
DP9: an OAI gateway service for web crawlers
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
A Metadata Catalog Service for Data Intensive Applications
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
The Anatomy of the Grid: Enabling Scalable Virtual Organizations
International Journal of High Performance Computing Applications
Improving the performance of Federated Digital Library services
Future Generation Computer Systems
Information Sciences: an International Journal
Hi-index | 0.00 |
With the growing acceptance of the Open Archive Initiative (OAI) [16] framework, a number of digital libraries are becoming OAI compliant. This is making it feasible to build an effective federated digital library, which harvests metadata from the OAI-compliant libraries and provides a unified search service over the aggregated metadata. Arc [10] is an example of such a federated digital library. Assuming that a rapid increase (e.g., several orders of magnitude) in the adoption of OAI-PMH [16] occurs, we now have a different problem: how to efficiently discover, harvest and index the burgeoning OAI-PMH corpus. In this project, we are working on using Grid and cluster technology to address these performance issues. In this paper, we focus on the use of Grid for parallelizing the harvesting task for an OAI-based federated digital library. We propose a Grid-based architecture for parallel harvesting that supports: dynamic allocation of harvesting nodes, scheduling of harvesting tasks to maximize the performance, and uniform load distribution for the indexing node. We have implemented and evaluated the proposed architecture on a Grid based on the GT3 toolkit