Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Optimal checkpointing and local recording for domino-free rollback recovery
Information Processing Letters
Efficient distributed recovery using message logging
Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Consistent global checkpoints based on direct dependency tracking
Information Processing Letters
Necessary and Sufficient Conditions for Consistent Global Snapshots
IEEE Transactions on Parallel and Distributed Systems
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints
IEEE Transactions on Computers
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Checkpointing distributed applications on mobile computers
PDIS '94 Proceedings of the third international conference on on Parallel and distributed information systems
A Communication-Induced Checkpointing Protocol that Ensures Rollback-Dependency Trackability
FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
Maximum and minimum consistent global checkpoints and their applications
SRDS '95 Proceedings of the 14TH Symposium on Reliable Distributed Systems
Rollback-dependency trackability: visible characterizations
Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
Quasi-Synchronous Checkpointing: Models, Characterization, and Classification
IEEE Transactions on Parallel and Distributed Systems
A Roll-Forward Recovery Scheme for Solving the Problem of Coasting Forward for Distributed Systems
ACM SIGOPS Operating Systems Review
A Low-Cost Checkpointing Technique for Distributed Databases
Distributed and Parallel Databases
On the Minimal Characterization of the Rollback-Dependency Trackability Property
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
On Properties of RDT Communication-Induced Checkpointing Protocols
IEEE Transactions on Parallel and Distributed Systems
On designing direct dependency: based fast recovery algorithms for distributed systems
ACM SIGOPS Operating Systems Review
Communication-based prevention of useless checkpoints in distributed computations
Distributed Computing
On the Complexity of Removing Z-Cycles from a Checkpoints and Communication Pattern
IEEE Transactions on Computers
A weighted checkpointing protocol for mobile distributed systems
International Journal of Ad Hoc and Ubiquitous Computing
A novel recovery approach for cluster federations
GPC'07 Proceedings of the 2nd international conference on Advances in grid and pervasive computing
Domino-effect free crash recovery for concurrent failures in cluster federation
GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing
Hi-index | 0.00 |
Rollback-Dependency Trackability (RDT) is a property that states that all rollback dependencies between local checkpoints are on-line trackable by using a transitive dependency vector. In this paper, we address three fundamental issues in the design of communication-induced checkpointing protocols that ensure RDT. First, we prove that the following intuition commonly assumed in the literature is in fact false: If a protocol forces a checkpoint only at a stronger condition, then it must take, at most, as many forced checkpoints as a protocol based on a weaker condition. This result implies that the common approach of sharpening the checkpoint-inducing condition by piggybacking more control information on each message may not always yield a more efficient protocol. Next, we prove that there is no optimal on-line RDT protocol that takes fewer forced checkpoints than any other RDT protocol for all possible communication patterns. Finally, since comparing checkpoint-inducing conditions is not sufficient for comparing protocol performance, we present some formal techniques for comparing the performance of several existing RDT protocols.