Protocols for Fault-Tolerant Distributed-Shared-Memory on the SOME-Bus Multiprocessor Architecture

  • Authors:
  • Diana Hecht;Constantine Katsinis

  • Affiliations:
  • -;-

  • Venue:
  • IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus) is a low-latency, high-bandwidth interconnection network that directly links arbitrary pairs of processor nodes without contention, and can efficiently interconnect over one hundred nodes. Each node has a dedicated output channel and an array of receivers, with one receiver dedicated to every other node's output channel. The SOME-Bus eliminates the need for global arbitration and provides bandwidth that scales directly with the number of nodes in the system. Under the Distributed Shared Memory (DSM) paradigm, the SOME-bus allows strong integration of the transmitter, receiver and cache controller hardware to produce a highly integrated system-wide cache coherence mechanism. Backward Error Recovery fault-tolerance techniques can rely on DSM data replication and SOME-Bus broadcasts with little additional network traffic and corresponding performance degradation. This paper presents three protocols for fault-tolerant DSM and uses simulation to examine the performance of the protocols on the SOME-Bus multiprocessor architecture.