Failure correction techniques for large disk arrays
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Introduction to algorithms
Designing fault-tolerant systems using automorphisms
Journal of Parallel and Distributed Computing
Some Practical Issues in the Design of Fault-Tolerant Multiprocessors
IEEE Transactions on Computers - Special issue on fault-tolerant computing
Node-covering, Error-correcting Codes and Multiprocessors with Very High Average Fault Tolerance
IEEE Transactions on Computers
Fault-Tolerant Meshes and Hypercubes with Minimal Numbers of Spares
IEEE Transactions on Computers
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Hi-index | 0.00 |
Abstract: Most previous work on fault-tolerant (FT) multiprocessor design has concentrated on deterministic k-fault-tolerant (k-FT) designs in which exactly k spare processors and some spare switches and links are added to construct multiprocessors that can tolerate any k processor faults. However, after k faults are reconfigured around, much of the extra links and switches can remain unutilized. We show how to use the node-covering principle of Dutt and Hayes (1992) and error correcting codes in order to construct probabilistic designs with very high average fault tolerance but low wiring and switch overhead. This design methodology is applicable to any multiprocessor interconnection topology. We also obtain the deterministic fault tolerance for these designs and develop efficient layout strategies for them.