Fault tolerance of beta-networks in interconnected multicomputer systems
Fault tolerance of beta-networks in interconnected multicomputer systems
Fault-tolerant routing in MIN-based supercomputers
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Testing the Dynamic Full Access Property of a Class of Multistage Interconnection Networks
IEEE Transactions on Parallel and Distributed Systems
Tolerating Multiple Faults in Multistage Interconnection Networks with Minimal Extra Stages
IEEE Transactions on Computers
All-to-All Personalized Communication Algorithms in Chordal Ring Networks
ICN '01 Proceedings of the First International Conference on Networking-Part 2
Performance analysis and fault tolerance of randomized routing on Clos networks
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Siamese-Twin: A Dynamically Fault-Tolerant Fat-Tree
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
A fault-tolerant 2*2 switching element for switching networks
Computer Communications
An efficient fault-tolerant routing methodology for fat-tree interconnection networks
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Hi-index | 14.98 |
The fault tolerance of multiprocessor systems with multistage interconnection networks under multiple faults in the network is studied. The fault tolerance is analyzed with respect to the criterion of dynamic full access (DFA) property of the processors in the system. A characterization of multiple faults in the Omega network is introduced and used to develop simple tests for the DFA capability under a given set of faults. It is shown that the DFA capability is maintained under a large number of faults. A maximum of three passes is shown to be sufficient for communication between any two processors in the system when the faults satisfy certain conditions which can be checked easily. For cases in which these conditions do not hold, at most log/sub 2/N-2 passes through the network are shown to be sufficient if a set of weaker conditions is satisfied. Techniques for routing data between processing elements through the faulty network are described. Extension of the results to general k-stage shuffle/exchange networks with klog/sub 2/N is also given. These techniques allow continued operation of a multiprocessor system in the presence of network faults with full connectivity among the processing elements of the system and minimal loss of network throughput.