Hypercube message routing in the presence of faults

Authors:
Jesse M. Gordon;Quentin F. Stout
Affiliations:
Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, MI, USA;Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, MI, USA
Venue:
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Year:
1988

Citing 6
Cited 23

The cosmic cube

Communications of the ACM - Special section on computer architecture
Routing, merging, and sorting on parallel models of computation

Journal of Computer and System Sciences
Graphical evolution: an introduction to the theory of random graphs

Graphical evolution: an introduction to the theory of random graphs
A microprocessor-based hypercube supercomputer

IEEE Micro
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
Universal schemes for parallel communication

STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing

Optimal communication algorithms for regular decompositions on the hypercube

C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Asymptotically Optimal Broadcasting and Gossiping in Faulty Hypercube Multicomputers

IEEE Transactions on Computers
A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks

IEEE Transactions on Parallel and Distributed Systems
Compressionless Routing: A Framework for Adaptive and Fault-Tolerant Routing

IEEE Transactions on Parallel and Distributed Systems
Reliable Unicasting in Faulty Hypercubes Using Safety Levels

IEEE Transactions on Computers
Use of Routing Capability for Fault-Tolerant Routing in Hypercube Multicomputers

IEEE Transactions on Computers
A Boolean Expression-Based Approach for Maximum Incomplete Subcube Identification in Faulty Hypercubes

IEEE Transactions on Parallel and Distributed Systems
A Fully Adaptive Routing Algorithm for Dynamically Injured Hypercubes, Meshes, and Tori

IEEE Transactions on Parallel and Distributed Systems
Adaptive Fault-Tolerant Routing in Cube-Based Multicomputers Using Safety Vectors

IEEE Transactions on Parallel and Distributed Systems
Unicast in Hypercubes with Large Number of Faulty Nodes

IEEE Transactions on Parallel and Distributed Systems
Adaptive and Deadlock-Free Routing for Irregular Faulty Patterns in Mesh Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Families of Optimal Fault-Tolerant Multiple-Bus Networks

IEEE Transactions on Parallel and Distributed Systems
Fault-Tolerant Routing in Hypercube Multicomputers Using Local Safety Information

IEEE Transactions on Parallel and Distributed Systems
Probability vectors: a new fault-tolerant routing algorithm for k-ary n-cubes

Proceedings of the 2002 ACM symposium on Applied computing
Unsafety vectors: a new fault-tolerant routing for the binary n-cube

Journal of Systems Architecture: the EUROMICRO Journal
A Fault-Tolerant Communication Scheme for Hypercube Computers

IEEE Transactions on Computers
Deadlock-Free Fault-Tolerant Routing in Injured Hypercubes

IEEE Transactions on Computers
Design and Evaluation of Hardware Strategies for Reconfiguring Hypercubes and Meshes Under Faults

IEEE Transactions on Computers
A Fault-Tolerant Routing Strategy in Hypercube Multicomputers

IEEE Transactions on Computers
Depth-First Search Approach for Fault-Tolerant Routing in Hypercube Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Routing in Modular Fault-Tolerant Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
Fault-tolerance of Complete Josephus Cubes

Journal of Systems Architecture: the EUROMICRO Journal
Fault-Tolerant Routing Algorithms for a Massively Parallel Machine

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01

Quantified Score

Hi-index	0.02

Visualization

Abstract

We discuss the problem of routing messages on hypercubes which have faulty processors and/or communication links. We are motivated by the belief that simple algorithms, operating under simple assumptions, can ensure high probabilities of successful message routing. In this paper, we consider the basic problem of routing a single message from an arbitrary source to an arbitrary destination. In our study, a fault is assumed to render the processor or link non-functional for purposes of communicating messages. As such, we may also consider communications hot spots as node faults, and our results also apply to routing in congested hypercubes.A framework for the analysis of fault tolerant routing schemes on a hypercube is presented. This framework includes differing routing schemes, routing information models and fault distribution models. The a priori probabilities of successful routing of a single, indivisible message under each of our possible sets of assumptions are calculated. Using random routing, under the one-step local information routing model, we show that the a priori probability of successful message routing is high even for an exceedingly large number of faults. We also analyze the behavior of sidetracking, a routing method which combines the concepts of local information and randomization. Using sidetracking, and in the one-step local information routing model, a message will be routed forward using random routing. If the message reaches a blocked processor (no non-faulty neighbors along a minimal path to the destination) it will be sent to a non-faulty neighbor, chosen uniformly at random from the set of non-faulty neighbors. We use simulation experiments to determine the performance of this routing scheme, analyzing the probability of successful routing and the expected path length of a routed message. The empirical performance of the sidetracking algorithms indicates strongly that, in the limit as the cube dimension grows larger and for a fixed probability of node failure, the probability of successful message routing is 100%.