Strategies for storage of checkpointing data using non-dedicated repositories on Grid systems

  • Authors:
  • Raphael Y. de Camargo;Renato Cerqueira;Fabio Kon

  • Affiliations:
  • University of São Paulo, Brazil;PUC-Rio, Brazil;University of São Paulo, Brazil

  • Venue:
  • MGC '05 Proceedings of the 3rd international workshop on Middleware for grid computing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Dealing with the large amounts of data generated by long-running parallel applications is one of the most challenging aspects of Grid Computing. Periodic checkpoints might be taken to guarantee application progression, producing even more data. The classical approach is to employ high-throughput checkpoint servers connected to the computational nodes by high speed networks. In the case of Opportunistic Grid Computing, we do not want to be forced to rely on such dedicated hardware. Instead, we want to use the shared Grid nodes to store application data in a distributed fashion.In this work, we evaluate several strategies to store checkpoints on distributed non-dedicated repositories. We consider the tradeoff among computational overhead, storage overhead, and degree of fault-tolerance of these strategies. We compare the use of replication, parity information, and information dispersal (IDA). We used InteGrade, an object-oriented Grid middleware, to implement the storage strategies and perform evaluation experiments.