When consensus meets self-stabilization

Authors:
Shlomi Dolev;Ronen I. Kat;Elad M. Schiller
Affiliations:
Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, Israel;Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, Israel;Division of Computer Science and Engineering, Chalmers University of Technology, Göteborg, Sweden
Venue:
OPODIS'06 Proceedings of the 10th international conference on Principles of Distributed Systems
Year:
2006

Citing 27
Cited 3

On processor coordination using asynchronous hardware

PODC '87 Proceedings of the sixth annual ACM Symposium on Principles of distributed computing
Self-stabilization of dynamic systems assuming only read/write atomicity

PODC '90 Proceedings of the ninth annual ACM symposium on Principles of distributed computing
Wait-free synchronization

ACM Transactions on Programming Languages and Systems (TOPLAS)
The weakest failure detector for solving consensus

PODC '92 Proceedings of the eleventh annual ACM symposium on Principles of distributed computing
Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
The part-time parliament

ACM Transactions on Computer Systems (TOCS)
Replication and fault-tolerance in the ISIS system

Proceedings of the tenth ACM symposium on Operating systems principles
Self-stabilization

Self-stabilization
Resettable vector clocks

Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Self-stabilizing systems in spite of distributed control

Communications of the ACM
Distributed Algorithms

Distributed Algorithms
Using Failure Detectors to Solve Consensus in Asynchronous Sharde-Memory Systems (Extended Abstract)

WDAG '94 Proceedings of the 8th International Workshop on Distributed Algorithms
Failure Detection Lower Bounds on Registers and Consensus

DISC '02 Proceedings of the 16th International Conference on Distributed Computing
Multitolerance in Distributed Reset

Multitolerance in Distributed Reset
On implementing omega with weak reliability and synchrony assumptions

Proceedings of the twenty-second annual symposium on Principles of distributed computing
Stability of long-lived consensus

Journal of Computer and System Sciences
Self-stabilizing timestamps

Theoretical Computer Science
Distributed Computing: Fundamentals, Simulations and Advanced Topics

Distributed Computing: Fundamentals, Simulations and Advanced Topics
Communication-efficient leader election and consensus with limited link synchrony

Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Brief announcement: virtual mobile nodes for mobile ad hoc networks

Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
A short introduction to failure detectors for asynchronous distributed systems

ACM SIGACT News
Disk Paxos

Distributed Computing
Illustrating the impossibility of crash-tolerant consensus in asynchronous systems

ACM SIGOPS Operating Systems Review
Ω meets paxos: leader election and stability without eventual timely links

DISC'05 Proceedings of the 19th international conference on Distributed Computing
Timed virtual stationary automata for mobile networks

OPODIS'05 Proceedings of the 9th international conference on Principles of Distributed Systems

Consensus When All Processes May Be Byzantine for Some Time

SSS '09 Proceedings of the 11th International Symposium on Stabilization, Safety, and Security of Distributed Systems
Stabilizing consensus with the power of two choices

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Pragmatic self-stabilization of atomic memory in message-passing systems

SSS'11 Proceedings of the 13th international conference on Stabilization, safety, and security of distributed systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a self-stabilizing failure detector, asynchronous consensus and replicated state-machine algorithm suite, the components of which can be started in an arbitrary state and converge to act as a virtual state-machine. Self-stabilizing algorithms can cope with transient faults. Transient faults can alter the system state to an arbitrary state and hence, cause a temporary violation of the safety property of the consensus. New requirements for consensus that fit the on-going nature of self-stabilizing algorithms are presented. The wait-free consensus (and the replicated state-machine) algorithm presented is a classic combination of a failure detector and a (memory bounded) rotating coordinator consensus that satisfy both eventual safety and eventual liveness. Several new techniques and paradigms are introduced. The bounded memory failure detector abstracts away synchronization assumptions using bounded heartbeat counters combined with a balance-unbalance mechanism. The practically infinite paradigm is introduced in the scope of self-stabilization, where an execution of, say, 264 sequential steps is regarded as (practically) infinite. Finally, we present the first self-stabilizing wait-free reset mechanism that ensures eventual safety and can be used in other scopes.