Consensus in the presence of partial synchrony
Journal of the ACM (JACM)
Automatically increasing the fault-tolerance of distributed algorithms
Journal of Algorithms
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
The weakest failure detector for solving consensus
Journal of the ACM (JACM)
Distributed Algorithms
Heartbeat: A Timeout-Free Failure Detector for Quiescent Reliable Communication
WDAG '97 Proceedings of the 11th International Workshop on Distributed Algorithms
Consensus in Asynchronous Systems Where Processes Can Crash and Recover
SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
Failure Detectors in Omission Failure Environments
Failure Detectors in Omission Failure Environments
IEEE Transactions on Software Engineering
On the Quality of Service of Failure Detectors
IEEE Transactions on Computers
On the Quality of Service of Failure Detectors
IEEE Transactions on Computers
A Versatile Family of Consensus Protocols Based on Chandra-Toueg's Unreliable Failure Detectors
IEEE Transactions on Computers
ACM SIGACT News
Fault-Tolerant Mobile Agent Execution
IEEE Transactions on Computers
Fast Indulgent Consensus with Zero Degradation
EDCC-4 Proceedings of the 4th European Dependable Computing Conference on Dependable Computing
DISC '01 Proceedings of the 15th International Conference on Distributed Computing
On the Impact of Fast Failure Detectors on Real-Time Fault-Tolerant Systems
DISC '02 Proceedings of the 16th International Conference on Distributed Computing
How to Model Link Failures: A Perception-Based Fault Model
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
ACM SIGACT news distributed computing column 11
ACM SIGACT News
IEEE Transactions on Knowledge and Data Engineering
Randomized protocols for asynchronous consensus
Distributed Computing - Papers in celebration of the 20th anniversary of PODC
Distributed Diagnosis in Dynamic Fault Environments
IEEE Transactions on Parallel and Distributed Systems
Communication-efficient leader election and consensus with limited link synchrony
Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Reliable and total order broadcast in the crash-recovery model
Journal of Parallel and Distributed Computing
Fault-scalable Byzantine fault-tolerant services
Proceedings of the twentieth ACM symposium on Operating systems principles
Construction of a fault-tolerant wireless communication topology using distributed agreement
DIWANS '06 Proceedings of the 2006 workshop on Dependability issues in wireless ad hoc networks and sensor networks
Harmful dogmas in fault tolerant distributed computing
ACM SIGACT News
On modeling and tolerating incorrect software
Journal of High Speed Networks - Self-Stabilizing Systems, Part 2
Agreement in synchronous networks with ubiquitous faults
Theoretical Computer Science
Easy Consensus Algorithms for the Crash-Recovery Model
DISC '08 Proceedings of the 22nd international symposium on Distributed Computing
Theoretical Computer Science
Implementing the Omega failure detector in the crash-recovery failure model
Journal of Computer and System Sciences
IEEE Journal on Selected Areas in Communications - Special issue on wireless and pervasive communications for healthcare
A simple and communication-efficient Omega algorithm in the crash-recovery model
Information Processing Letters
Semi-passive replication and Lazy Consensus
Journal of Parallel and Distributed Computing
Randomization can be a healer: consensus with dynamic omission failures
DISC'09 Proceedings of the 23rd international conference on Distributed computing
The failure detector abstraction
ACM Computing Surveys (CSUR)
A new approach to fault-tolerant mobile agent execution in distributed systems
EC'05 Proceedings of the 6th WSEAS international conference on Evolutionary computing
Modeling fault-tolerant and reliable mobile agent execution in distributed systems
EC'05 Proceedings of the 6th WSEAS international conference on Evolutionary computing
A new approach for evaluation fault-tolerant mobile agent execution in distributed systems
EC'05 Proceedings of the 6th WSEAS international conference on Evolutionary computing
A new approach for evaluation fault-tolerant mobile agent execution in distributed systems
EC'05 Proceedings of the 6th WSEAS international conference on Evolutionary computing
Synchronous consensus under hybrid process and link failures
Theoretical Computer Science
Multi-writer regular registers in dynamic distributed systems with byzantine failures
Proceedings of the 3rd International Workshop on Theoretical Aspects of Dynamic Distributed Systems
Communication-efficient leader election in crash-recovery systems
Journal of Systems and Software
An algorithm for implementing BFT registers in distributed systems with bounded churn
SSS'11 Proceedings of the 13th international conference on Stabilization, safety, and security of distributed systems
OPODIS'06 Proceedings of the 10th international conference on Principles of Distributed Systems
Failure detection with booting in partially synchronous systems
EDCC'05 Proceedings of the 5th European conference on Dependable Computing
Majority and unanimity in synchronous networks with ubiquitous dynamic faults
SIROCCO'05 Proceedings of the 12th international conference on Structural Information and Communication Complexity
A practical distributed mutual exclusion protocol in dynamic peer-to-peer systems
IPTPS'04 Proceedings of the Third international conference on Peer-to-Peer Systems
Randomized wait-free consensus using an atomicity assumption
OPODIS'05 Proceedings of the 9th international conference on Principles of Distributed Systems
Advances in the design and implementation of group communication middleware
Dependable Systems
On detecting termination in the crash-recovery model
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
A protocol for implementing byzantine storage in churn-prone distributed systems
Theoretical Computer Science
Hi-index | 0.02 |
We study the problems of failure detection and consensus in asynchronous systems in which processes may crash and recover, and links may lose messages. We first propose new failure detectors that are particularly suitable to the crash-recovery model. We next determine under what conditions stable storage is necessary to solve consensus in this model. Using the new failure detectors, we give two consensus algorithms that match these conditions: one requires stable storage and the other does not. Both algorithms tolerate link failures and are particularly efficient in the runs that are most likely in practice - those with no failures or failure detector mistakes. In such runs, consensus is achieved within 3δ time and with 4n messages, where δ is the maximum message delay and n is the number of processes in the system.