A Survey of Recoverable Distributed Shared Virtual Memory Systems
IEEE Transactions on Parallel and Distributed Systems
Design, implementation and evaluation of ICARE: an efficient recoverable DSM
Software—Practice & Experience - Special issue on multiprocessor operating systems
On Optimal Design of Network Topology
LCN '98 Proceedings of the 23rd Annual IEEE Conference on Local Computer Networks
On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-Memory Multiprocessors.
HIPC '98 Proceedings of the Fifth International Conference on High Performance Computing
Fast barrier synchronization in wormhole k-ary n-cube networks with multidestination worms
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
All-to-all broadcast in torus with wormhole-like routing
SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
Hi-index | 0.00 |
This paper presents a set of distributed-shared-memory protocols that provide fault tolerance on broadcast-based and switch-based architectures with no decrease in performance. These augmented DSM protocols combine the data duplication required by fault tolerance with the data duplication that naturally results in distributed-sharedmemory implementations. The recovery memory at each backup node is continuously maintained consistent and is accessible by all processes executing at the backup node. Simulation results show that the additional data duplication necessary to create fault-tolerant DSM causes no reduction in system performance during normal operation and eliminates most of the overhead at checkpoint creation. Data blocks which are duplicated to maintain the recovery memory are also utilized by the DSM protocol, reducing network traffic, and increasing the processor utilization significantly. We use simulation and multiprocessor address trace files to compare the performance of a broadcast architecture called the SOME-Bus to the performance of two representative switch architectures.