Communications of the ACM - Special section on computer architecture
The connection machine
A reconfiguration algorithm for fault tolerance in a hypercube multiprocessor
Information Processing Letters
iPSC/2 system: a second generation hypercube
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Distributed subcube identification algorithms for reliable hypercubes
Information Processing Letters
The cube-connected cycles: a versatile network for parallel computation
Communications of the ACM
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
A Fault-Tolerant Communication Scheme for Hypercube Computers
IEEE Transactions on Computers
IEEE Transactions on Parallel and Distributed Systems
Submesh Determination in Faulty Tori and Meshes
IEEE Transactions on Parallel and Distributed Systems
Fast and efficient submesh determination in faulty tori
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Hi-index | 14.98 |
A hypercube may operate in a gracefully degraded manner, after faults arise, by supporting the execution of parallel algorithms in smaller fault-free subcubes. In order to reduce execution slowdown in a hypercube with given faults, it is essential to identify the maximum healthy subcubes in the faulty hypercube because the time for executing a parallel algorithm tends to depend on the dimension of the assigned subcube. This paper describes an efficient procedure capable of determining all maximum fault-free subcubes in a faulty hypercube. The procedure is a distributed one, since every healthy node next to a failed component performs the same procedure independently and concurrently. Based on interesting properties of faulty hypercubes, this procedure exhibits empirically polynomial time complexity with respect to the system dimension and the number of faults, for a practical range of dimensions. It compares favorably with prior methods when the number of faults is in the order of the system dimension. This procedure can deal with node failures and link failures uniformly and equally efficiently.