Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors

Authors:
Brian T. Gold;Babak Falsafi;James C. Hoe
Affiliations:
-;-;-
Venue:
PRDC '09 Proceedings of the 2009 15th IEEE Pacific Rim International Symposium on Dependable Computing
Year:
2009

Citing 0
Cited 2

Cost-effective safety and fault localization using distributed temporal redundancy

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Heuristic search for adaptive, defect-tolerant multiprocessor arrays

ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed shared-memory (DSM) multiprocessors provide a scalable hardware platform, but lack the necessary redundancy for mainframe-level reliability and availability. Chip-level redundancy in a DSM server faces a key challenge: the increased latency to check results among redundant components. To address performance overheads, we propose a checking filter that reduces the number of checking operations impeding the critical path of execution. Furthermore, we propose to decouple checking operations from the coherence protocol, which simplifies the implementation and permits reuse of existing coherence controller hardware. Our simulation results of commercial workloads indicate average performance overhead is within 4% (9% maximum) of tightly coupled DMR solutions.