Gracefully Degrading Systems Using the Bulk-Synchronous Parallel Model with Randomised Shared Memory

  • Authors:
  • A. Savva;T. Nanya

  • Affiliations:
  • -;-

  • Venue:
  • FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: The bulk-synchronous parallel model (BSPM) was proposed as a bridging model for parallel computation by Valiant (1990). By using randomised shared memory (RSM), this model offers an asymptotically optimal emulation of the PRAM. By using the BSPM with RSM, we show how a gracefully degrading massively parallel system can be obtained through: memory duplication to ensure global memory integrity, and to speed up the reconfiguration; a global reconfiguration method that restores the logical properties of the system, after a fault occurs. We assume fail-stop processors, single faults, no spare processors, and no significant loss of network throughput as a result of faults. Work done during reconfiguration is shared equally among the live processors, with minimal coordination. The overhead of the scheme and the graceful degradation achieved depend on the program being executed. We evaluate the reconfiguration, overhead, and graceful degradation of the system experimentally.