The impossibility of boosting distributed service resilience

Authors:
Paul Attie;Rachid Guerraoui;Petr Kuznetsov;Nancy Lynch;Sergio Rajsbaum
Affiliations:
Department of Computer Science and Center for Advanced Mathematical Studies, American University of Beirut, Bliss Hall, PO Box 11-0236, Riad El-Solh, Beirut 1107 2020, Lebanon;Distributed Programming Laboratory, EPFL, LPD (Station 14), I&C, CH 1015 Lausanne, Switzerland;Technische Universität Berlin/Deutsche Telekom Laboratories, FG INET, Sekr. TEL 4, Ernst-Reuter Platz 7, D-10587 Berlin, Germany;MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar Street (32-G668), Cambridge, MA 02139, USA;Instituto de Matemáticas, Universidad Nacional Autónoma de México (UNAM), D.F. 04510, Mexico
Venue:
Information and Computation
Year:
2011

Citing 9
Cited 0

Linearizability: a correctness condition for concurrent objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Wait-free synchronization

ACM Transactions on Programming Languages and Systems (TOPLAS)
Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
The weakest failure detector for solving consensus

Journal of the ACM (JACM)
Distributed Algorithms

Distributed Algorithms
A Realistic Look At Failure Detectors

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Generalized Irreducibility of Consensus and the Equivalence of t-Resilient and Wait-Free Implementations of Consensus

SIAM Journal on Computing
The Impossibility of Boosting Distributed Service Resilience

ICDCS '05 Proceedings of the 25th IEEE International Conference on Distributed Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study f-resilient services, which are guaranteed to operate as long as no more than f of the associated processes fail. We prove three theorems asserting the impossibility of boosting the resilience of such services. Our first theorem allows any connection pattern between processes and services but assumes these services to be atomic (linearizable) objects. This theorem says that no distributed system in which processes coordinate using f-resilient atomic objects and reliable registers can solve the consensus problem in the presence of f+1 undetectable process stopping failures. In contrast, we show that it is possible to boost the resilience of some systems solving problems easier than consensus: for example, the 2-set-consensus problem is solvable for 2n processes and 2n-1 failures (i.e., wait-free) using n-process consensus services resilient to n-1 failures (wait-free). Our proof is short and self-contained. We then introduce the larger class of failure-oblivious services. These are services that cannot use information about failures, although they may behave more flexibly than atomic objects. An example of such a service is totally ordered broadcast. Our second theorem generalizes the first theorem and its proof to failure-oblivious services. Our third theorem allows the system to contain failure-aware services, such as failure detectors, in addition to failure-oblivious services. This theorem requires that each failure-aware service be connected to all processes; thus, f+1 process failures overall can disable all the failure-aware services. In contrast, it is possible to boost the resilience of a system solving consensus using failure-aware services if arbitrary connection patterns between processes and services are allowed: consensus is solvable for any number of failures using only 1-resilient 2-process perfect failure detectors. As far as we know, this is the first time a unified framework has been used to describe both atomic and non-atomic objects, and the first time boosting analysis has been performed for services more general than atomic objects.