Theoretical Analysis for Communication-Induced Checkpointing Protocols with Rollback-Dependency Trackability

Authors:
Jichiang Tsai;Sy-Yen Kuo;Yi-Min Wang
Affiliations:
National Taiwan Univ., Taipei, Taiwan, R.O.C.;National Taiwan Univ., Taipei, Taiwan, R.O.C.;Microsoft Corp., Redmond, WA
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1998

Citing 12
Cited 12

Optimistic recovery in distributed systems

ACM Transactions on Computer Systems (TOCS)
Checkpointing and Rollback-Recovery for Distributed Systems

IEEE Transactions on Software Engineering - Special issue on distributed systems
Optimal checkpointing and local recording for domino-free rollback recovery

Information Processing Letters
Efficient distributed recovery using message logging

Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Consistent global checkpoints based on direct dependency tracking

Information Processing Letters
Necessary and Sufficient Conditions for Consistent Global Snapshots

IEEE Transactions on Parallel and Distributed Systems
Distributed snapshots: determining global states of distributed systems

ACM Transactions on Computer Systems (TOCS)
Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints

IEEE Transactions on Computers
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Checkpointing distributed applications on mobile computers

PDIS '94 Proceedings of the third international conference on on Parallel and distributed information systems
A Communication-Induced Checkpointing Protocol that Ensures Rollback-Dependency Trackability

FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
Maximum and minimum consistent global checkpoints and their applications

SRDS '95 Proceedings of the 14TH Symposium on Reliable Distributed Systems

Rollback-dependency trackability: visible characterizations

Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
Quasi-Synchronous Checkpointing: Models, Characterization, and Classification

IEEE Transactions on Parallel and Distributed Systems
A Roll-Forward Recovery Scheme for Solving the Problem of Coasting Forward for Distributed Systems

ACM SIGOPS Operating Systems Review
A Low-Cost Checkpointing Technique for Distributed Databases

Distributed and Parallel Databases
On the Minimal Characterization of the Rollback-Dependency Trackability Property

ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
On Properties of RDT Communication-Induced Checkpointing Protocols

IEEE Transactions on Parallel and Distributed Systems
On designing direct dependency: based fast recovery algorithms for distributed systems

ACM SIGOPS Operating Systems Review
Communication-based prevention of useless checkpoints in distributed computations

Distributed Computing
On the Complexity of Removing Z-Cycles from a Checkpoints and Communication Pattern

IEEE Transactions on Computers
A weighted checkpointing protocol for mobile distributed systems

International Journal of Ad Hoc and Ubiquitous Computing
A novel recovery approach for cluster federations

GPC'07 Proceedings of the 2nd international conference on Advances in grid and pervasive computing
Domino-effect free crash recovery for concurrent failures in cluster federation

GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Rollback-Dependency Trackability (RDT) is a property that states that all rollback dependencies between local checkpoints are on-line trackable by using a transitive dependency vector. In this paper, we address three fundamental issues in the design of communication-induced checkpointing protocols that ensure RDT. First, we prove that the following intuition commonly assumed in the literature is in fact false: If a protocol forces a checkpoint only at a stronger condition, then it must take, at most, as many forced checkpoints as a protocol based on a weaker condition. This result implies that the common approach of sharpening the checkpoint-inducing condition by piggybacking more control information on each message may not always yield a more efficient protocol. Next, we prove that there is no optimal on-line RDT protocol that takes fewer forced checkpoints than any other RDT protocol for all possible communication patterns. Finally, since comparing checkpoint-inducing conditions is not sufficient for comparing protocol performance, we present some formal techniques for comparing the performance of several existing RDT protocols.