Fault-Tolerant Multiprocessors with Redundant-Path Interconnection Networks
IEEE Transactions on Computers - The MIT Press scientific computation series
Fault-Tolerant Routing in Multistage Interconnection Networks
IEEE Transactions on Computers
Fault-tolerant routing in unique-path multistage interconnection networks
Information Processing Letters
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Data Structures and Algorithms
Data Structures and Algorithms
The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer
IEEE Transactions on Computers
Siamese-Twin: A Dynamically Fault-Tolerant Fat-Tree
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Maintaining quality of service with dynamic fault tolerance in fat-trees
HiPC'08 Proceedings of the 15th international conference on High performance computing
An efficient fault-tolerant routing methodology for fat-tree interconnection networks
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Hi-index | 0.00 |
In this paper we study methods for routing data in supercomputers that use multistage interconnection networks (MINs), in the presence of faulty components in the network. These methods are applicable to existing multiprocessors like IBM GF11 and RP3. These methods are based on the concept of dynamic full-access(DFA) which refers to the ability of the network to route data from any processor in the system to any other processor in a finite number of passes through the network. We introduce a graph-model called DFA graph of a MIN and show how it can be used to determine the DFA capability of the MIN under a given set of network faults. When the faults in the network satisfy certain special properties, we present algorithms for routingany arbitrary permutation in a faulty Bene@@@@ network, andany Omega permutation in a faulty Omega network.These algorithms are simple and operate in a distributed fashion. These techniques allow a supercomputer to efficiently realize permutations of data needed in a parallel computing environment despite the presence of faults in the network.