Communications of the ACM - Special section on computer architecture
Combinatorial theory (2nd ed.)
Combinatorial theory (2nd ed.)
Distributed fault-tolerance for large multiprocessor systems
ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
A large scale, homogeneous, fully distributed parallel machine, I
ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
Asymptotically Optimal Broadcasting and Gossiping in Faulty Hypercube Multicomputers
IEEE Transactions on Computers
A Graph Partitioning Approach to Sequential Diagnosis
IEEE Transactions on Computers
Optimal Polling in Communication Networks
IEEE Transactions on Parallel and Distributed Systems
Information Dissemination in Distributed Systems with Faulty Units
IEEE Transactions on Computers
Hi-index | 14.99 |
The concept of fault-tolerant self-diagnostics is introduced for distributed systems, and it is shown that there exists a performance tradeoff between the complexity of a self-diagnostic algorithm and the level of fault tolerance inherited by the algorithm. Hypercube systems are selected, and it is shown that designing an optimal algorithm for such systems has an equivalent coding theory formulation which belongs to the case of NP-hard problems. An efficient diagnostic scheme is proposed for these systems, and the performance tradeoff of the proposed algorithm, which is based on a combinatorial structure called the Hadamard matrix, is studied. The tradeoff between the fault tolerance and traffic complexity of the proposed diagnostic algorithm for hypercubes of small size is evaluated. An interesting compromise is exhibited for the hypercube with an arbitrary size.