Parallelising harvesting

Authors:
Hussein Suleman
Affiliations:
Department of Computer Science, University of Cape Town, Rondebosch, South Africa
Venue:
ICADL'06 Proceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities
Year:
2006

Citing 4
Cited 0

Parallel programming: techniques and applications using networked workstations and parallel computers

Parallel programming: techniques and applications using networked workstations and parallel computers
The open archives initiative: building a low-barrier interoperability framework

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Scalability Issues for High Performance Digital Libraries on the World Wide Web

Scalability Issues for High Performance Digital Libraries on the World Wide Web
Introduction

Sourcebook of parallel computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Metadata harvesting has become a common technique to transfer a stream of data from one metadata repository or digital library system to another. As collections of metadata, and their associated digital objects, grow in size, the ingest of these items at the destination archive can take a significant amount of time, depending on the type of indexing or post-processing that is required. This paper discusses an approach to parallelise the post-processing of data in a small cluster of machines or a multi-processor environment, while not increasing the burden on the source data provider. Performance tests have been carried out on varying architectures and the results indicate that this technique is indeed promising for some scenarios and can be extended to more computationally-intensive ingest procedures. In general, the technique presents a new approach for the construction of harvest-based distributed or component-based digital libraries, with better scalability than before.