Communications of the ACM - Special section on computer architecture
Distributed fault-tolerance for large multiprocessor systems
ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
A large scale, homogeneous, fully distributed parallel machine, I
ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
Reliable Broadcast in Hypercube Multicomputers
IEEE Transactions on Computers
A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks
IEEE Transactions on Parallel and Distributed Systems
Global Commutative and Associative Reduction Operations in Faulty SIMD Hypercubes
IEEE Transactions on Computers
A Fault-Tolerant Tree Communication Scheme for Hypercube Systems
IEEE Transactions on Computers
Adaptive Fault-Tolerant Deadlock-Free Routing in Meshes and Hypercubes
IEEE Transactions on Computers
All-to-All Broadcasting in Faulty Hypercubes
IEEE Transactions on Computers
A Fault-Tolerant Communication Scheme for Hypercube Computers
IEEE Transactions on Computers
Deadlock-Free Fault-Tolerant Routing in Injured Hypercubes
IEEE Transactions on Computers
Design and Evaluation of Hardware Strategies for Reconfiguring Hypercubes and Meshes Under Faults
IEEE Transactions on Computers
Depth-First Search Approach for Fault-Tolerant Routing in Hypercube Multicomputers
IEEE Transactions on Parallel and Distributed Systems
Routing in Modular Fault-Tolerant Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
Strong Fault-Tolerance: Parallel Routing in Networks with Faults
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Optimal broadcasting in injured hypercubes using directed safety levels
Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Hi-index | 0.02 |
This paper examines routing and broadcasting algorithms for hypercube computers subject to node failures. First some simple message-passing algorithms are described which perform well with certain fault patterns, but poorly with others. The concept of an unsafe node is introduced to identify fault-free nodes that may cause communication difficulties in faulty hypercubes. It is then shown that by only using “feasible” paths that try to avoid unsafe nodes, routing and broadcasting can be substantially simplified. It is assumed that each active node is supplied with the fault status of all neighboring nodes within a specified radius k. A computationally efficient routing algorithm is presented which can route a message via a path of length no greater than p+2, where p is the minimum feasible distance from the source to the destination, provided that not all non-faulty nodes in the hypercube are unsafe, and k = 1. We further show that broadcasting can be achieved under the same fault conditions with only one more time unit than the fault-free case.