A reconfiguration algorithm for fault tolerance in a hypercube multiprocessor
Information Processing Letters
Distributed subcube identification algorithms for reliable hypercubes
Information Processing Letters
The DASH prototype: implementation and performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The J-machine multicomputer: an architectural evaluation
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Journal of Parallel and Distributed Computing
Subcube Determination in Faulty Hypercubes
IEEE Transactions on Computers
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
An Efficient Submesh Allocation Scheme for Two-Dimensional Meshes with Little Overhead
IEEE Transactions on Parallel and Distributed Systems
Reconfiguring Processor Arrays Using Multiple-Track Models: The 3Track-Spare-Approach
IEEE Transactions on Computers
The Rule-Based Approach to Reconfiguration of 2-D Processor Arrays
IEEE Transactions on Computers
Proceedings of the The IEEE International Workshop on Defect and Fault Tolerance in VLSI Systems
Journal of Parallel and Distributed Computing
Routing-contained virtualization based on Up*/Down* forwarding
HiPC'07 Proceedings of the 14th international conference on High performance computing
Fast and efficient submesh determination in faulty tori
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Hi-index | 0.00 |
Torus/mesh-based machines have received increasing attention. It is natural to identify the maximum healthy submeshes in a faulty torus/mesh so as to lower potential performance degradation, because the time for executing a parallel algorithm tends to depend on the size of the assigned submesh. This paper proposes an efficient approach for identifying all the maximum healthy submeshes present in a faulty torus/mesh. The proposed approach is based on manipulating set expressions, with the search space reduced considerably by taking advantage of the interesting properties of a faulty torus/mesh. This procedure is a distributed one, because every healthy node performs the same procedure independently and concurrently. We show that the proposed scheme may outperform previous methods.