Diskless Checkpointing with Rollback-Dependency Trackability

  • Authors:
  • Raphael Marcos Menderico;Islene Calciolari Garcia

  • Affiliations:
  • -;-

  • Venue:
  • SRDS '10 Proceedings of the 2010 29th IEEE Symposium on Reliable Distributed Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

One way to implement fault tolerant applications is storing its current state in stable memory and, when a failure occurs, restart the application from the last global consistent state. If the number of simultaneous failures is expected to be small a diskless check pointing approach can be used, where a failed process’s state can be determined only accessing non-faulty process’s memory. In the iterature diskless check pointing is usually based on synchronous protocols or properties of the application. In this paper we present a quasi-synchronous diskless check pointing algorithm, called RDT-Diskless, based on Rollback-Dependency Track ability. The proposed algorithm includes a garbage collection approach that limits the number of checkpoints that must be kept in memory. A framework, called Cheops, was developed and experimental results were obtained from a commercial cloud environment.