Gracefully Degrading Systems Using the Bulk-Synchronous Parallel Model with Randomised Shared Memory

Authors:
A. Savva;T. Nanya
Affiliations:
-;-
Venue:
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Year:
1995

Citing 12
Cited 2

Randomized and deterministic simulations of PRAMs by parallel machines with restricted granularity of parallel memories

Acta Informatica
A bridging model for parallel computation

Communications of the ACM
Self-stabilization

ACM Computing Surveys (CSUR)
Fail-stop processors: an approach to designing fault-tolerant computing systems

ACM Transactions on Computer Systems (TOCS)
Synthesis of Algorithm-Based Fault-Tolerant Systems from Dependence Graphs

IEEE Transactions on Parallel and Distributed Systems
Programmer-Transparent Coordination of Recovering Concurrent Processes: Philosophy and Rules for Efficient Implementation

IEEE Transactions on Software Engineering
Direct Bulk-Synchronous Parallel Algorithms

SWAT '92 Proceedings of the Third Scandinavian Workshop on Algorithm Theory
Simulation-based Comparison of Hash Functions for Emulated Shared Memory

PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe
Design of a Router for Fault-Tolerant Networks

PCRCW '94 Proceedings of the First International Workshop on Parallel Computer Routing and Communication
On the Practical Efficiency of Randomized Shared Memory

CONPAR '92/ VAPP V Proceedings of the Second Joint International Conference on Vector and Parallel Processing: Parallel Processing
Randomized Shared Memory - Concept and Efficiency of a Scalable Shared Memory Scheme

Parallel Computer Architectures: Theory, Hardware, Software, Applications
The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer

IEEE Transactions on Computers

A comprehensive bibliography of distributed shared memory

ACM SIGOPS Operating Systems Review
A Gracefully Degrading Massively Parallel System Using the BSP Model, and Its Evaluation

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: The bulk-synchronous parallel model (BSPM) was proposed as a bridging model for parallel computation by Valiant (1990). By using randomised shared memory (RSM), this model offers an asymptotically optimal emulation of the PRAM. By using the BSPM with RSM, we show how a gracefully degrading massively parallel system can be obtained through: memory duplication to ensure global memory integrity, and to speed up the reconfiguration; a global reconfiguration method that restores the logical properties of the system, after a fault occurs. We assume fail-stop processors, single faults, no spare processors, and no significant loss of network throughput as a result of faults. Work done during reconfiguration is shared equally among the live processors, with minimal coordination. The overhead of the scheme and the graceful degradation achieved depend on the program being executed. We evaluate the reconfiguration, overhead, and graceful degradation of the system experimentally.