Routing in Modular Fault-Tolerant Multiprocessor Systems

Authors:
M. Sultan Alam;Rami G. Melhem
Affiliations:
-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1995

Citing 20
Cited 5

On fault tolerant routings in general networks

PODC '86 Proceedings of the fifth annual ACM symposium on Principles of distributed computing
Reconfiguring a hypercube in the presence of faults

STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
Communication effect basic linear algebra computations on hypercube architectures

Journal of Parallel and Distributed Computing
Interstitial Redundancy: An Area Efficient Fault Tolerance Scheme for Large Area VLSI Processor Arrays

IEEE Transactions on Computers
Hyperswitch network for the hypercube computer

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Efficient dispersal of information for security, load balancing, and fault tolerance

Journal of the ACM (JACM)
Hypercube message routing in the presence of faults

C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Routing and broadcasting in faulty hypercube computers

C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Fault-Tolerant Array Processors Using Single-Track Switches

IEEE Transactions on Computers
Reconfiguration of VLSI/WSI Mesh Array Processors with Two-Level Redundancy

IEEE Transactions on Computers
Near-optimal message routing and broadcasting in faulty hypercubes

International Journal of Parallel Programming
An Adaptive and Fault Tolerant Wormhole Routing Strategy for k-ary n-cubes

IEEE Transactions on Computers
Reconfiguration Strategies for VLSI Processor Arrays and Trees Using a Modified Diogenes Approach

IEEE Transactions on Computers
Tolerating Faults in Hypercubes Using Subcube Partitioning

IEEE Transactions on Computers - Special issue on fault-tolerant computing
Channel multiplexing in fault-tolerant modular multiprocessors

Journal of Parallel and Distributed Computing
Introduction to Algorithms: A Creative Approach

Introduction to Algorithms: A Creative Approach
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Depth-First Search Approach for Fault-Tolerant Routing in Hypercube Multicomputers

IEEE Transactions on Parallel and Distributed Systems
An Efficient Modular Spare Allocation Scheme and Its Application to Fault Tolerant Binary Hypercubes

IEEE Transactions on Parallel and Distributed Systems
Universal schemes for parallel communication

STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing

Design of a Circuit-Switched Highly Fault-Tolerant k-ary n-cube

ICPP '97 Proceedings of the international Conference on Parallel Processing
Enhanced Cluster k-Ary n-Cube, A Fault-Tolerant Multiprocessor

IEEE Transactions on Computers
An efficient reconfiguration scheme for fault-tolerant meshes

Information Sciences—Informatics and Computer Science: An International Journal
An improved replacement algorithm in fault-tolerant meshes

Proceedings of the 2007 Summer Computer Simulation Conference
An efficient reconfiguration scheme for fault-tolerant meshes

Information Sciences: an International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, we consider a class of modular multiprocessor architectures in which spares are added to each module to cover for faulty nodes within that module, thus forming a fault-tolerant basic block (FTBB). In contrast to reconfiguration techniques that preserve the physical adjacency between active nodes in the system, our goal is to preserve the logical adjacency between active nodes by means of a routing algorithm which delivers messages successfully to their destinations. We introduce two-phase routing strategies that route messages first to their destination FTBB, and then to the destination nodes within the destination FTBB. Such a strategy may be applied to a variety of architectures including binary hypercubes and three-dimensional tori. In the presence of f faults in hypercubes and tori, we show that the worst case length of the message route is min {驴+f, (K+ 1)驴}+c where 驴 is the shortest path in the absence of faults, K is the number of spare nodes in an FTBB, and c is a small constant. The average routing overhead is much lower than the worst case overhead.