A Service for Data-Intensive Computations on Virtual Clusters

Authors:
Rainer Schmidt;Christian Sadilek;Ross King
Affiliations:
-;-;-
Venue:
INTENSIVE '09 Proceedings of the 2009 First International Conference on Intensive Applications and Services
Year:
2009

Citing 0
Cited 3

The planets interoperability framework: an infrastructure for digital preservation actions

ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries
The Planets IF: a framework for integrated access to preservation tools

Proceedings of the 1st International Digital Preservation Interoperability Framework Symposium
An approach for processing large and non-uniform media objects on mapreduce-based clusters

ICADL'11 Proceedings of the 13th international conference on Asia-pacific digital libraries: for cultural heritage, knowledge dissemination, and future creation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Digital Preservation deals with the long-term storage, access, and maintenance of digital data objects. In order to prevent a loss of information, digital libraries and archives are increasingly faced with the need to electronically preserve vast amounts of data while having limited computational resources in-house. However, due to the potentially immense data sets and computationally intensive tasks involved, preservation systems have become a recognized challenge for e-science. We argue that grid and cloud technology can provide the crucial technology for building scalable preservation systems. In this paper, we present recent developments on a Job Submission Service that is based on standard grid mechanisms and capable of providing a large cluster of virtual machines. The service allows clients to specify and execute preservation tools on large data sets based on dynamically generated job descriptors. This approach allows us to utilize a cloud infrastructure that is based on platform virtualization as a scaling environment for the execution of preservation workflows. Finally, we present experimental results that have been conducted on the Amazon EC2 and S3 utility cloud infrastructure.