Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Checkpointing and rollback-recovery algorithms in distributed systems
Journal of Systems and Software - Special issue on fault tolerance in real-time systems
Necessary and Sufficient Conditions for Consistent Global Snapshots
IEEE Transactions on Parallel and Distributed Systems
Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems
IEEE Transactions on Parallel and Distributed Systems
Communication-Induced Determination of Consistent Snapshots
IEEE Transactions on Parallel and Distributed Systems
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
An Efficient Protocol for Checkpointing Recovery in Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Consistency Issues in Distributed Checkpoints
IEEE Transactions on Software Engineering
Checkpointing with mutable checkpoints
Theoretical Computer Science - Dependable computing
Preventing Useless Checkpoints in Distributed Computations
SRDS '97 Proceedings of the 16th Symposium on Reliable Distributed Systems
Hi-index | 0.00 |
Two approaches are used to reduce the overhead associated with coordinated checkpointing:one is to reduce the number of synchronization messages and the number of checkpoints;the other is to make the checkpointing process non-blocking.In this paper, we introduce the concept of “computing checkpoint” to design an efficient consistent non-blocking coordinated checkpointing algorithm that combines these two approaches.Through piggybacking the information that which processes have taken new checkpoints in the broadcast committing message, the checkpoint sequence number of every process can be kept consistent in all processes,so that the unnecessary checkpoints and orphan messages can be avoided in the future running.The algorithm needn’t block any process and has lower overhead than other proposed consistent coordinated checkpointing algorithms.