Resilience of mutual exclusion algorithms to transient memory faults

Authors:
Thomas Moscibroda;Rotem Oshman
Affiliations:
Microsoft Research, Redmond, WA, USA;MIT, Cambridge, MA, USA
Venue:
Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Year:
2011

Citing 20
Cited 0

The mutual exclusion problem: partII—statement and solutions

Journal of the ACM (JACM)
Linearizability: a correctness condition for concurrent objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Mutual exclusion revisited

JCIT Proceedings of the fifth Jerusalem conference on Information technology
Computing with faulty shared memory

PODC '92 Proceedings of the eleventh annual ACM symposium on Principles of distributed computing
Bounds on shared memory for mutual exclusion

Information and Computation
Computing with faulty shared objects

Journal of the ACM (JACM)
Fault-tolerant wait-free shared objects

Journal of the ACM (JACM)
Concurrent Reading While Writing

ACM Transactions on Programming Languages and Systems (TOPLAS)
Detailed design and evaluation of redundant multithreading alternatives

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
How to Construct an Atomic Variable (Extended Abstract)

Proceedings of the 3rd International Workshop on Distributed Algorithms
Shared-Memory Simulations on a Faulty-Memory DMM

ICALP '96 Proceedings of the 23rd International Colloquium on Automata, Languages and Programming
Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Transient-fault recovery for chip multiprocessors

Proceedings of the 30th annual international symposium on Computer architecture
Deterministic computations on a PRAM with static processor and memory faults

Fundamenta Informaticae
Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation

IEEE Micro
Automatic Instruction-Level Software-Only Recovery

IEEE Micro
Model Checking Linearizability via Refinement

FM '09 Proceedings of the 2nd World Congress on Formal Methods
Proving linearizability via non-atomic refinement

IFM'07 Proceedings of the 6th international conference on Integrated formal methods
Designing reliable algorithms in unreliable memories

ESA'05 Proceedings of the 13th annual European conference on Algorithms
From unreliable objects to reliable objects: the case of atomic registers and consensus

PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the behavior of mutual exclusion algorithms in the presence of unreliable shared memory subject to transient memory faults. It is well-known that classical 2-process mutual exclusion algorithms, such as Dekker and Peterson's algorithms, are not fault-tolerant; in this paper we ask what degree of fault tolerance can be achieved using the same restricted resources as Dekker and Peterson's algorithms, namely, three binary read/write registers. We show that if one memory fault can occur, it is not possible to guarantee both mutual exclusion and deadlock-freedom using three binary registers; this holds in general when fewer than 2f+1 binary registers are used and f may be faulty. Hence we focus on algorithms that guarantee (a) mutual exclusion and starvation-freedom in fault-free executions, and (b) only mutual exclusion in faulty executions. We show that using only three binary registers it is possible to design an 2-process mutual exclusion algorithm which tolerates a single memory fault in this manner. Further, by replacing one read/write register with a test&set register, we can guarantee mutual exclusion in executions where one variable experiences unboundedly many faults. In the more general setting where up to f registers may be faulty, we show that it is not possible to guarantee mutual exclusion using 2f + 1 binary read/write registers if each faulty register can exhibit unboundedly many faults. On the positive side, we show that an n-variable single-fault tolerant algorithm satisfying certain conditions can be transformed into an ((n-1)f + 1)-variable f-fault tolerant algorithm with the same progress guarantee as the original. In combination with our three-variable algorithm, this implies that there is a (2f+1)-variable mutual exclusion algorithm tolerating a single fault in up to f variables without violating mutual exclusion.