Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Partial orders for parallel debugging
PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
Necessary and Sufficient Conditions for Consistent Global Snapshots
IEEE Transactions on Parallel and Distributed Systems
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints
IEEE Transactions on Computers
A framework for viewing atomic events in distributed computations
Theoretical Computer Science - Special issue on parallel computing
Evaluations of domino-free communication-induced checkpointing protocols
Information Processing Letters
Quasi-Synchronous Checkpointing: Models, Characterization, and Classification
IEEE Transactions on Parallel and Distributed Systems
Communication-Induced Determination of Consistent Snapshots
IEEE Transactions on Parallel and Distributed Systems
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Consistency Issues in Distributed Checkpoints
IEEE Transactions on Software Engineering
Virtual Precedence in Asynchronous Systems: Cencept and Applications
WDAG '97 Proceedings of the 11th International Workshop on Distributed Algorithms
A Communication-Induced Checkpointing Protocol that Ensures Rollback-Dependency Trackability
FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
An Index-Based Checkpointing Algorithm For Autonomous Distributed Systems
SRDS '97 Proceedings of the 16th Symposium on Reliable Distributed Systems
Communication-based prevention of useless checkpoints in distributed computations
Distributed Computing
Data-stream-based global event monitoring using pairwise interactions
Journal of Parallel and Distributed Computing
From an intermittent rotating star to a leader
OPODIS'07 Proceedings of the 11th international conference on Principles of distributed systems
Computing, observing, controlling, checkpointing: symbiosis is even better than agreement!
DISC'09 Proceedings of the 23rd international conference on Distributed computing
Two abstractions for implementing atomic objects in dynamic systems
OPODIS'05 Proceedings of the 9th international conference on Principles of Distributed Systems
Analysis of interval-based global state detection
ICDCIT'05 Proceedings of the Second international conference on Distributed Computing and Internet Technology
From the Happened-Before Relation to the Causal Ordered Set Abstraction
Journal of Parallel and Distributed Computing
Hi-index | 0.01 |
An interval of a sequential process is a sequence of consecutive events of this process. The set of intervals defined on a distributed computation defines an abstraction of this distributed computation, and the traditional causality relation on events induces a relation on the set of intervals that we call I-precedence . An important question is then, "Is the interval-based abstraction associated with a distributed computation consistent?" To answer this question, this paper introduces a consistency criterion named interval consistency (IC). Intuitively, this criterion states that an interval-based abstraction of a distributed computation is consistent if its I-precedence relation does not contradict the sequentiality of each process. More formally, IC is defined as a property of a precedence graph. Interestingly, the IC criterion can be operationally characterized in terms of timestamps (whose values belong to a lattice). The paper uses this characterization to design a versatile protocol that, given intervals defined by a daemon whose behavior is unpredictable, breaks them (in a nontrivial manner) in order to produce an abstraction satisfying the IC criterion. Applications to communication-induced checkpointing are suggested.