Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Consistent global checkpoints based on direct dependency tracking
Information Processing Letters
Necessary and Sufficient Conditions for Consistent Global Snapshots
IEEE Transactions on Parallel and Distributed Systems
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints
IEEE Transactions on Computers
IEEE Transactions on Parallel and Distributed Systems
Rollback-dependency trackability: visible characterizations
Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
Quasi-Synchronous Checkpointing: Models, Characterization, and Classification
IEEE Transactions on Parallel and Distributed Systems
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
On the no-z-cycle property in distributed executions
Journal of Computer and System Sciences
Rollback-dependency trackability: a minimal characterization and its protocol
Information and Computation
Information Processing Letters
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
A Communication-Induced Checkpointing Protocol that Ensures Rollback-Dependency Trackability
FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
An Analysis of Communication-Induced Checkpointing
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
On Characteristics of DEF Communication-Induced Checkpointing Protocols
PRDC '02 Proceedings of the 2002 Pacific Rim International Symposium on Dependable Computing
On the Minimal Characterization of the Rollback-Dependency Trackability Property
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
SRDS '04 Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems
SRDS '04 Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems
Communication-based prevention of useless checkpoints in distributed computations
Distributed Computing
A multi-cycle checkpointing protocol that ensures strict 1-rollback
Information Processing Letters
Hi-index | 0.00 |
Rollback-Dependency Trackability (RDT) is a property stating that all rollback dependencies between local checkpoints are online trackable by using a transitive dependency vector. The most crucial RDT characterizations introduced in the literature can be represented as certain types of RDT-PXCM-paths. Here, let the U-path and V-path be any two types of RDT-PXCM-paths. In this paper, we investigate several properties of communication-induced checkpointing protocols that ensure the RDT property. First, we prove that if an online RDT protocol encounters a U-path at a point of a checkpoint and communication pattern associated with a distributed computation, it also encounters a V-path there. Moreover, if this encountered U-path is invisibly doubled, the corresponding encountered V-path is invisibly doubled as well. Therefore, we can conclude that breaking all invisibly doubled U-paths is equivalent to breaking all invisibly doubled V-paths for an online RDT protocol. Next, we continue to demonstrate that a visibly doubled U-path must contain a doubled U-cycle in the causal past. These results can further deduce that some different checkpointing protocols actually have the same behavior for all possible patterns. Finally, we present a commendatory systematic technique for comparing the performance of online RDT protocols.