Fault Tolerance Mechanisms for SLA-aware Resource Management

  • Authors:
  • Matthias Hovestadt

  • Affiliations:
  • Paderborn Center for Parallel Computing (PC2), Universität Paderborn, Germany

  • Venue:
  • ICPADS '05 Proceedings of the 11th International Conference on Parallel and Distributed Systems - Workshops - Volume 02
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Future Grid systems will demand for properties like runtime responsibility, predictability, and a guaranteed service quality level. In this context, Service Level Agreements will have central importance. Many ongoing research projects already focus on the realization of required mechanisms at Grid middleware layer. However, only concentrating on Grid Middleware is not enough. Also the underlying resource management systems have to provide an increased QoS level, since they provide their resources to Grid environments. The EU-funded project HPC4U aims at realizing an SLA-aware resource management system. It allows the Grid user to negotiate on SLAs, assuring the adherence with agreed SLAs by means of application-transparent checkpointing, snapshotting, and migration.