Fault-Tolerant Distributed Shared Memory on a Broadcast-Based Architecture

  • Authors:
  • Constantine Katsinis;Diana Hecht

  • Affiliations:
  • IEEE;IEEE

  • Venue:
  • IEEE Transactions on Parallel and Distributed Systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Due to advances in fiber-optics and VLSI technology, interconnection networks that allow multiple simultaneous broadcasts are becoming feasible. Distributed-shared-memory implementations on such networks promise high performance even for applications with small granularity. This paper presents the architecture of one such implementation, called the Simultaneous Optical Multiprocessor Exchange Bus, and examines the performance of augmented DSM protocols that exploit the natural duplication of data to maintain a recovery memory in each processing node and provide basic fault tolerance. Simulation results show that the additional data duplication necessary to create fault-tolerant DSM causes no reduction in system performance during normal operation and eliminates most of the overhead at checkpoint creation. Under certain conditions, data blocks that are duplicated to maintain the recovery memory are utilized by the underlying DSM protocol, reducing network traffic, and increasing the processor utilization significantly.