A closer look at fault tolerance

Authors:
Gadi Taubenfeld
Affiliations:
The Interdisciplinary Center, Herzliya, Israel
Venue:
PODC '12 Proceedings of the 2012 ACM symposium on Principles of distributed computing
Year:
2012

Citing 31
Cited 0

Extended impossibility results for asynchronous complete networks

Information Processing Letters
The ambiguity of choosing

Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Tight bounds for shared memory symmetric mutual exclusion problems

Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Shared-memory vs. message-passing in an asynchronous distributed environment

Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Linearizability: a correctness condition for concurrent objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Renaming in an asynchronous environment

Journal of the ACM (JACM)
Wait-free synchronization

ACM Transactions on Programming Languages and Systems (TOPLAS)
A completeness theorem for a class of synchronization objects

PODC '93 Proceedings of the twelfth annual ACM symposium on Principles of distributed computing
Generalized FLP impossibility result for t-resilient asynchronous computations

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Bounds on shared memory for mutual exclusion

Information and Computation
Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
Wait-free algorithms for fast, long-lived renaming

Science of Computer Programming
Possibility and impossibility results in a shared memory environment

Acta Informatica
Long-lived renaming made adaptive

Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
Fast, wait-free (2k-1)-renaming

Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
Reaching Agreement in the Presence of Faults

Journal of the ACM (JACM)
The topological structure of asynchronous computability

Journal of the ACM (JACM)
Wait-Free k-Set Agreement is Impossible: The Topology of Public Knowledge

SIAM Journal on Computing
The concurrency hierarchy, and algorithms for unbounded concurrency

Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
The BG distributed simulation algorithm

Distributed Computing
Polynominal and Adaptive Long-Lived (2k-1)-Renaming

DISC '00 Proceedings of the 14th International Conference on Distributed Computing
Adaptive Long-Lived O(k2)-Renaming with O(k2) Steps

DISC '01 Proceedings of the 15th International Conference on Distributed Computing
Algorithms adapting to point contention

Journal of the ACM (JACM)
Long-Lived Adaptive Collect with Applications

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Long lived adaptive splitter and applications

Distributed Computing
Common2 extended to stacks and unbounded concurrency

Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing
Achievable cases in an asynchronous environment

SFCS '87 Proceedings of the 28th Annual Symposium on Foundations of Computer Science
The disagreement power of an adversary: extended abstract

Proceedings of the 28th ACM symposium on Principles of distributed computing
On asymmetric progress conditions

Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
The computational structure of progress conditions

DISC'10 Proceedings of the 24th international conference on Distributed computing
The renaming problem in shared memory systems: An introduction

Computer Science Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

The traditional notion of fault tolerance requires that all the correct participating processes eventually terminate, and thus, is not sensitive to the number of correct processes that should properly terminate as a result of failures. Intuitively, an algorithm that in the presence of any number of faults always guarantees that all the correct processes except maybe one properly terminate, is more resilient to faults than an algorithm that in the presence of a single fault does not even guarantee that a single correct process ever terminates. However, according to the standard notion of fault tolerance both algorithms are classified as algorithms that can not tolerate a single fault. To overcome this difficulty, we generalize the traditional notion of fault tolerance in a way which enables to capture more sensitive information about the resiliency of an algorithm. Then, we present several algorithms for solving classical problems which are resilient under the new notion. It is well known that, in an asynchronous systems where processes communicate either by reading and writing atomic registers or by sending and receiving messages, important problems such as, consensus, set-consensus, election, perfect renaming, implementations of a test-and-set bit, a shared stack, a swap object and a fetch-and-add object have no deterministic solutions which can tolerate even a single fault. We show that while, some of these problems have solutions which guarantee that in the presence of any number of faults most of the correct processes will properly terminate; other problems do not even have solutions which guarantee that in the presence of just one fault at least one correct process properly terminates.