Performance of Fault-Tolerant Distributed Shared Memory on Broadcast- and Switch-Based Architectures

  • Authors:
  • Constantine Katsinis

  • Affiliations:
  • Drexel University, Philadelphia, PA

  • Venue:
  • IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a set of distributed-shared-memory protocols that provide fault tolerance on broadcast-based and switch-based architectures with no decrease in performance. These augmented DSM protocols combine the data duplication required by fault tolerance with the data duplication that naturally results in distributed-sharedmemory implementations. The recovery memory at each backup node is continuously maintained consistent and is accessible by all processes executing at the backup node. Simulation results show that the additional data duplication necessary to create fault-tolerant DSM causes no reduction in system performance during normal operation and eliminates most of the overhead at checkpoint creation. Data blocks which are duplicated to maintain the recovery memory are also utilized by the DSM protocol, reducing network traffic, and increasing the processor utilization significantly. We use simulation and multiprocessor address trace files to compare the performance of a broadcast architecture called the SOME-Bus to the performance of two representative switch architectures.