Diskless Checkpointing with Rollback-Dependency Trackability

Authors:
Raphael Marcos Menderico;Islene Calciolari Garcia
Affiliations:
-;-
Venue:
SRDS '10 Proceedings of the 2010 29th IEEE Symposium on Reliable Distributed Systems
Year:
2010

Citing 0
Cited 1

Analyzing, modeling and evaluating dynamic adaptive fault tolerance strategies in cloud computing environments

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

One way to implement fault tolerant applications is storing its current state in stable memory and, when a failure occurs, restart the application from the last global consistent state. If the number of simultaneous failures is expected to be small a diskless check pointing approach can be used, where a failed process’s state can be determined only accessing non-faulty process’s memory. In the iterature diskless check pointing is usually based on synchronous protocols or properties of the application. In this paper we present a quasi-synchronous diskless check pointing algorithm, called RDT-Diskless, based on Rollback-Dependency Track ability. The proposed algorithm includes a garbage collection approach that limits the number of checkpoints that must be kept in memory. A framework, called Cheops, was developed and experimental results were obtained from a commercial cloud environment.