A Service for Data-Intensive Computations on Virtual Clusters

  • Authors:
  • Rainer Schmidt;Christian Sadilek;Ross King

  • Affiliations:
  • -;-;-

  • Venue:
  • INTENSIVE '09 Proceedings of the 2009 First International Conference on Intensive Applications and Services
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Digital Preservation deals with the long-term storage, access, and maintenance of digital data objects. In order to prevent a loss of information, digital libraries and archives are increasingly faced with the need to electronically preserve vast amounts of data while having limited computational resources in-house. However, due to the potentially immense data sets and computationally intensive tasks involved, preservation systems have become a recognized challenge for e-science. We argue that grid and cloud technology can provide the crucial technology for building scalable preservation systems. In this paper, we present recent developments on a Job Submission Service that is based on standard grid mechanisms and capable of providing a large cluster of virtual machines. The service allows clients to specify and execute preservation tools on large data sets based on dynamically generated job descriptors. This approach allows us to utilize a cloud infrastructure that is based on platform virtualization as a scaling environment for the execution of preservation workflows. Finally, we present experimental results that have been conducted on the Amazon EC2 and S3 utility cloud infrastructure.