Systematic Design of Fault-Tolerant Multiprocessors with Shared Buses

Authors:
Hung-Kuei Ku;John P. Hayes
Affiliations:
AT&T Bell, Middletown, NJ;Univ. of Michigan, Ann Arbor
Venue:
IEEE Transactions on Computers
Year:
1997

Citing 14
Cited 5

The design and analysis of VLSI circuits

The design and analysis of VLSI circuits
Principles of CMOS VLSI design: a systems perspective

Principles of CMOS VLSI design: a systems perspective
Fat-trees: universal networks for hardware-efficient supercomputing

IEEE Transactions on Computers
Reconfigurable Tree Architectures Using Subtree Oriented Fault Tolerance

IEEE Transactions on Computers
The Wisconsin multicube: a new large-scale cache-coherent multiprocessor

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A Taxonomy of Reconfiguration Techniques for Fault-Tolerant Processor Arrays

Computer
Introduction to algorithms

Introduction to algorithms
Designing fault-tolerant systems using automorphisms

Journal of Parallel and Distributed Computing
Reliable computer systems (2nd ed.): design and evaluation

Reliable computer systems (2nd ed.): design and evaluation
Some Practical Issues in the Design of Fault-Tolerant Multiprocessors

IEEE Transactions on Computers - Special issue on fault-tolerant computing
Parallel supercomputing in MIMD architectures

Parallel supercomputing in MIMD architectures
The CM-5 Connection Machine: a scalable supercomputer

Communications of the ACM
Advanced Computer Architecture: Parallelism,Scalability,Programmability

Advanced Computer Architecture: Parallelism,Scalability,Programmability
Fault-Tolerant Meshes and Hypercubes with Minimal Numbers of Spares

IEEE Transactions on Computers

Fault-Tolerant Processor Arrays Using Additional Bypass Linking Allocated by Graph-Node Coloring

IEEE Transactions on Computers
Families of Optimal Fault-Tolerant Multiple-Bus Networks

IEEE Transactions on Parallel and Distributed Systems
Enhanced Cluster k-Ary n-Cube, A Fault-Tolerant Multiprocessor

IEEE Transactions on Computers
An efficient reconfiguration scheme for fault-tolerant meshes

Information Sciences—Informatics and Computer Science: An International Journal
An efficient reconfiguration scheme for fault-tolerant meshes

Information Sciences: an International Journal

Quantified Score

Hi-index	14.99

Visualization

Abstract

A multiprocessor system is fault-tolerant (FT) if it preserves a fault-free subsystem of a predetermined interconnection structure when faults appear. We present a new method for designing FT multiprocessors that can efficiently tolerate both processor and interconnection faults. The approach is general, in that it can be applied to any multiprocessor topology. Shared buses serve as the main interconnection mechanism to minimize the switching logic needed for reconfiguration. We employ processor-bus-link (PBL) graphs to model multiprocessors with either dedicated or shared buses. Both processors and buses are represented as nodes so that bus faults can be considered explicitly and tolerated efficiently by spare buses instead of by spare processors. A minimum number of spare processors and buses are used to reduce hardware overhead. The node covering concept and the maximum-weight spanning tree algorithm are then employed to construct FT systems that have lower interconnection cost than most previous designs. We also present a cost-effective implementation method which is suitable for both static and dynamic reconfiguration techniques. The FT systems obtained have the advantages of no critical single point of failure, low redundancy, local replacement, and simple circuitry for fast reconfiguration.