Performance of Fault-Tolerant Diagnostics in the Hypercube Systems

Authors:
A. Ghafoor;P. Solé
Affiliations:
Syracuse Univ., Syracuse, NY;Syracuse Univ., Syracuse, NY
Venue:
IEEE Transactions on Computers
Year:
1989

Citing 4
Cited 4

The cosmic cube

Communications of the ACM - Special section on computer architecture
Combinatorial theory (2nd ed.)

Combinatorial theory (2nd ed.)
Distributed fault-tolerance for large multiprocessor systems

ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
A large scale, homogeneous, fully distributed parallel machine, I

ISCA '77 Proceedings of the 4th annual symposium on Computer architecture

Asymptotically Optimal Broadcasting and Gossiping in Faulty Hypercube Multicomputers

IEEE Transactions on Computers
A Graph Partitioning Approach to Sequential Diagnosis

IEEE Transactions on Computers
Optimal Polling in Communication Networks

IEEE Transactions on Parallel and Distributed Systems
Information Dissemination in Distributed Systems with Faulty Units

IEEE Transactions on Computers

Quantified Score

Hi-index	14.99

Visualization

Abstract

The concept of fault-tolerant self-diagnostics is introduced for distributed systems, and it is shown that there exists a performance tradeoff between the complexity of a self-diagnostic algorithm and the level of fault tolerance inherited by the algorithm. Hypercube systems are selected, and it is shown that designing an optimal algorithm for such systems has an equivalent coding theory formulation which belongs to the case of NP-hard problems. An efficient diagnostic scheme is proposed for these systems, and the performance tradeoff of the proposed algorithm, which is based on a combinatorial structure called the Hadamard matrix, is studied. The tradeoff between the fault tolerance and traffic complexity of the proposed diagnostic algorithm for hypercubes of small size is evaluated. An interesting compromise is exhibited for the hypercube with an arbitrary size.