Guaranteed Mutually Consistent Checkpointing in Distributed Computations

Authors:
Zhonghua Yang;Chengzheng Sun;Abdul Sattar;Yanyan Yang
Affiliations:
-;-;-;-
Venue:
ASIAN '98 Proceedings of the 4th Asian Computing Science Conference on Advances in Computing Science
Year:
1998

Citing 14
Cited 0

Optimistic recovery in distributed systems

ACM Transactions on Computer Systems (TOCS)
Checkpointing and Rollback-Recovery for Distributed Systems

IEEE Transactions on Software Engineering - Special issue on distributed systems
The causal ordering abstraction and a simple way to implement it

Information Processing Letters
Necessary and Sufficient Conditions for Consistent Global Snapshots

IEEE Transactions on Parallel and Distributed Systems
Distributed snapshots: determining global states of distributed systems

ACM Transactions on Computer Systems (TOCS)
About state recording in asynchronous computations

PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Global States and Time in Distributed Systems

Global States and Time in Distributed Systems
A New Algorithm to Implement Causal Ordering

Proceedings of the 3rd International Workshop on Distributed Algorithms
The Role of Inhibition on Asynchronous Consistent-Cut Protocols

Proceedings of the 3rd International Workshop on Distributed Algorithms
Maximum and minimum consistent global checkpoints and their applications

SRDS '95 Proceedings of the 14TH Symposium on Reliable Distributed Systems
Vector time and causality among abstract events in distributed computations

Distributed Computing
The inhibition spectrum and the achievement of causal consistency

Distributed Computing
Detecting causal relationships in distributed computations: in search of the holy grail

Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we emplore the isomorphism between vector time and causality to characterize consistency of a set of checkpoints in a distributed computing. A necessary and sufficient condition, to determine if a set of checkpoints can form a consistent global checkpoint, is presented and proved using the isomorphic power of vector time and causality. To the best of our knowledge, this is the first attempt to use the isomorphism for this purpose. This condition leads to a simple and straightforward algorithm for a guaranteed mutually consistent global checkpointing. In our approach, a process can take a checkpoint whenever and wherever it wants while other related process may be asked to take an additional checkpoint for ensuring the mutual consistency. We also show how this condition and the resulting algorithm can be used to obtain a maximum and minimum global checkpoints, another important paradigm for distributed applications.