Communications of the ACM - Special section on computer architecture
Routing, merging, and sorting on parallel models of computation
Journal of Computer and System Sciences
Graphical evolution: an introduction to the theory of random graphs
Graphical evolution: an introduction to the theory of random graphs
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
Universal schemes for parallel communication
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Optimal communication algorithms for regular decompositions on the hypercube
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Asymptotically Optimal Broadcasting and Gossiping in Faulty Hypercube Multicomputers
IEEE Transactions on Computers
A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks
IEEE Transactions on Parallel and Distributed Systems
Compressionless Routing: A Framework for Adaptive and Fault-Tolerant Routing
IEEE Transactions on Parallel and Distributed Systems
Reliable Unicasting in Faulty Hypercubes Using Safety Levels
IEEE Transactions on Computers
Use of Routing Capability for Fault-Tolerant Routing in Hypercube Multicomputers
IEEE Transactions on Computers
IEEE Transactions on Parallel and Distributed Systems
A Fully Adaptive Routing Algorithm for Dynamically Injured Hypercubes, Meshes, and Tori
IEEE Transactions on Parallel and Distributed Systems
Adaptive Fault-Tolerant Routing in Cube-Based Multicomputers Using Safety Vectors
IEEE Transactions on Parallel and Distributed Systems
Unicast in Hypercubes with Large Number of Faulty Nodes
IEEE Transactions on Parallel and Distributed Systems
Adaptive and Deadlock-Free Routing for Irregular Faulty Patterns in Mesh Multicomputers
IEEE Transactions on Parallel and Distributed Systems
Families of Optimal Fault-Tolerant Multiple-Bus Networks
IEEE Transactions on Parallel and Distributed Systems
Fault-Tolerant Routing in Hypercube Multicomputers Using Local Safety Information
IEEE Transactions on Parallel and Distributed Systems
Probability vectors: a new fault-tolerant routing algorithm for k-ary n-cubes
Proceedings of the 2002 ACM symposium on Applied computing
Unsafety vectors: a new fault-tolerant routing for the binary n-cube
Journal of Systems Architecture: the EUROMICRO Journal
A Fault-Tolerant Communication Scheme for Hypercube Computers
IEEE Transactions on Computers
Deadlock-Free Fault-Tolerant Routing in Injured Hypercubes
IEEE Transactions on Computers
Design and Evaluation of Hardware Strategies for Reconfiguring Hypercubes and Meshes Under Faults
IEEE Transactions on Computers
A Fault-Tolerant Routing Strategy in Hypercube Multicomputers
IEEE Transactions on Computers
Depth-First Search Approach for Fault-Tolerant Routing in Hypercube Multicomputers
IEEE Transactions on Parallel and Distributed Systems
Routing in Modular Fault-Tolerant Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
Fault-tolerance of Complete Josephus Cubes
Journal of Systems Architecture: the EUROMICRO Journal
Fault-Tolerant Routing Algorithms for a Massively Parallel Machine
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Hi-index | 0.02 |
We discuss the problem of routing messages on hypercubes which have faulty processors and/or communication links. We are motivated by the belief that simple algorithms, operating under simple assumptions, can ensure high probabilities of successful message routing. In this paper, we consider the basic problem of routing a single message from an arbitrary source to an arbitrary destination. In our study, a fault is assumed to render the processor or link non-functional for purposes of communicating messages. As such, we may also consider communications hot spots as node faults, and our results also apply to routing in congested hypercubes.A framework for the analysis of fault tolerant routing schemes on a hypercube is presented. This framework includes differing routing schemes, routing information models and fault distribution models. The a priori probabilities of successful routing of a single, indivisible message under each of our possible sets of assumptions are calculated. Using random routing, under the one-step local information routing model, we show that the a priori probability of successful message routing is high even for an exceedingly large number of faults. We also analyze the behavior of sidetracking, a routing method which combines the concepts of local information and randomization. Using sidetracking, and in the one-step local information routing model, a message will be routed forward using random routing. If the message reaches a blocked processor (no non-faulty neighbors along a minimal path to the destination) it will be sent to a non-faulty neighbor, chosen uniformly at random from the set of non-faulty neighbors. We use simulation experiments to determine the performance of this routing scheme, analyzing the probability of successful routing and the expected path length of a routed message. The empirical performance of the sidetracking algorithms indicates strongly that, in the limit as the cube dimension grows larger and for a fixed probability of node failure, the probability of successful message routing is 100%.