Fault-tolerant computing: theory and techniques; Vol. 2
A Generalized Theory for System Level Diagnosis
IEEE Transactions on Computers
Distributed Diagnosis and the System User
IEEE Transactions on Computers
The Byzantine Generals Problem
ACM Transactions on Programming Languages and Systems (TOPLAS)
Graph Algorithms
Distributed fault-tolerance for large multiprocessor systems
ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
Fault tolerance in distributed computing systems and databases
Fault tolerance in distributed computing systems and databases
Graph Theory with Applications to Engineering and Computer Science (Prentice Hall Series in Automatic Computation)
Implementation of Online Distributed System-Level Diagnosis Theory
IEEE Transactions on Computers - Special issue on fault-tolerant computing
Efficient Distributed Algorithms for Self Testing of Multiple Processor Systems
IEEE Transactions on Computers
Adaptive System-Level Diagnosis for Hypercube Multiprocessors
IEEE Transactions on Computers
Generating a deterministic task migration path for multiprocessor scheduling
SAC '94 Proceedings of the 1994 ACM symposium on Applied computing
IEEE Transactions on Computers
A flexible formal framework for masking/demasking faults
Information Sciences—Informatics and Computer Science: An International Journal
Hi-index | 14.99 |
Fault diagnosis is treated as two distinct processes: fault discovery and dissemination of diagnostic information. Previous research determined what level of self-diagnosability a given set of test in a homogeneous system achieves, using a model in which only node failures occur and test coverage is complete. Adopting the same model, a new methodology is presented that minimizes the overhead associated with periodic testing, thus lowering testing overhead. The method diagnoses up to c-.1 faults (c is the connectivity of the system topology). The savings in testing is valid when processor failure rates are low. Environments are also examined with high processor failure rates. It is shown that adopting the proposed methodology for such systems results in greater reliability, while maintaining the same effective processing power.