Controlling memory access concurrency in efficient fault-tolerant parallel algorithms

Authors:
Paris C. Kanellakis;Dimitrios Michailidis;Alex Allister Shvartsman
Affiliations:
Department of Computer Science, Brown University, Box 1910, Providence, RI;Department of Computer Science, Brown University, Box 1910, Providence, RI;Laboratory for Computer Science, Massachusetts Institute of Technology, 545 Technology Square, Cambridge, MA
Venue:
Nordic Journal of Computing
Year:
1995

Citing 27
Cited 2

A Survey and Comparision of Fault-Tolerant Multistage Interconnection Networks

Computer
Efficient synchronization of multiprocessors with shared memory

ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel algorithmic techniques for combinatorial computation

Annual review of computer science: vol. 3, 1988
A more practical PRAM model

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
The APRAM: incorporating asynchrony into the PRAM model

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Efficient parallel algorithms can be made robust

Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Asynchronous shared memory parallel computation

SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
The expected advantage of asynchrony

SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Efficient robust parallel computations

STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing
Understanding fault-tolerant distributed systems

Communications of the ACM
Wait-free parallel algorithms for the union-find problem

STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
Combining tentative and definite executions for very fast dependable parallel computing

STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
Efficient parallel algorithms on restartable fail-stop processors

PODC '91 Proceedings of the tenth annual ACM symposium on Principles of distributed computing
Parallel algorithms for shared-memory machines

Handbook of theoretical computer science (vol. A)
Achieving optimal CRCW PRAM fault-tolerance

Information Processing Letters
Efficient program transformations for resilient parallel computation via randomization (preliminary version)

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Performing work efficiently in the presence of faults

PODC '92 Proceedings of the eleventh annual ACM symposium on Principles of distributed computing
Work-optimal asynchronous algorithms for shared memory parallel computers

SIAM Journal on Computing
An efficient Write-All algorithm for fail-stop PRAM without initialized memory

Information Processing Letters
On the complexity of certified write-all algorithms

Journal of Algorithms
Time-optimal message-efficient work performance in the presence of faults

PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
The Parallel Evaluation of General Arithmetic Expressions

Journal of the ACM (JACM)
Ultracomputers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Fail-stop processors: an approach to designing fault-tolerant computing systems

ACM Transactions on Computer Systems (TOCS)
Controlling Memory Access Concurrency in Efficient Fault-Tolerant Parallel Algorithms (Extended Abstract)

WDAG '93 Proceedings of the 7th International Workshop on Distributed Algorithms
Efficient Parallelism vs Reliable Distribution: A Trade-off for Concurrent Computations

CONCUR '94 Proceedings of the Concurrency Theory
Parallelism in random access machines

STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing

In Memoriam: Paris C. Kanellakis

PCK50 Proceedings of the Paris C. Kanellakis memorial workshop on Principles of computing & knowledge: Paris C. Kanellakis memorial workshop on the occasion of his 50th birthday
distributed cooperation and adversity: complexity trade-offs

PCK50 Proceedings of the Paris C. Kanellakis memorial workshop on Principles of computing & knowledge: Paris C. Kanellakis memorial workshop on the occasion of his 50th birthday

Quantified Score

Hi-index	0.00

Visualization

Abstract

The CRCW PRAM under dynamic fail-stop (no restart) processor behavior is a fault-prone multiprocessor model for which it is possible to both guarantee reliability and preserve efficiency. To handle dynamic faults some redundancy is necessary in the form of many processors concurrently performing a common read or write task. In this paper we show how to significantly decrease this concurrency by bounding it in terms of the number of actual processor faults. We describe a low concurrency, efficient and fault-tolerant algorithm for the Write-All primitive: "using ≤ N processors, write l's into N locations". This primitive can serve as the basis for efficient fault-tolerant simulations of algorithms written for fault-free PRAMs on fault-prone PRAMs. For any dynamic failure pattern F, our algorithm has total write concurrency ≤ |F| and total read concurrency ≤ 7 |F| logN, where |F| is the number of processor faults (for example, there is no concurrency in a run without failures); note that, previous algorithms used Ω(Nlog N) concurrency even in the absence of faults. We also describe a technique for limiting the per step concurrency and present an optimal fault-tolerant EREW PRAM algorithm for Write-All, when all processor faults are initial.