Dynamic Testing Strategy for Distributed Systems

Authors:
F. J. Meyer;D. K. Pradhan
Affiliations:
Univ. of Massachusetts, Amherst;Univ. of Massachusetts, Amherst
Venue:
IEEE Transactions on Computers
Year:
1989

Citing 8
Cited 6

System diagnosis

Fault-tolerant computing: theory and techniques; Vol. 2
A Generalized Theory for System Level Diagnosis

IEEE Transactions on Computers
Distributed Diagnosis and the System User

IEEE Transactions on Computers
The Byzantine Generals Problem

ACM Transactions on Programming Languages and Systems (TOPLAS)
Graph Algorithms

Graph Algorithms
Distributed fault-tolerance for large multiprocessor systems

ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
Fault tolerance in distributed computing systems and databases

Fault tolerance in distributed computing systems and databases
Graph Theory with Applications to Engineering and Computer Science (Prentice Hall Series in Automatic Computation)

Graph Theory with Applications to Engineering and Computer Science (Prentice Hall Series in Automatic Computation)

Implementation of Online Distributed System-Level Diagnosis Theory

IEEE Transactions on Computers - Special issue on fault-tolerant computing
Efficient Distributed Algorithms for Self Testing of Multiple Processor Systems

IEEE Transactions on Computers
Adaptive System-Level Diagnosis for Hypercube Multiprocessors

IEEE Transactions on Computers
Generating a deterministic task migration path for multiprocessor scheduling

SAC '94 Proceedings of the 1994 ACM symposium on Applied computing
Safe System Level Diagnosis

IEEE Transactions on Computers
A flexible formal framework for masking/demasking faults

Information Sciences—Informatics and Computer Science: An International Journal

Quantified Score

Hi-index	14.99

Visualization

Abstract

Fault diagnosis is treated as two distinct processes: fault discovery and dissemination of diagnostic information. Previous research determined what level of self-diagnosability a given set of test in a homogeneous system achieves, using a model in which only node failures occur and test coverage is complete. Adopting the same model, a new methodology is presented that minimizes the overhead associated with periodic testing, thus lowering testing overhead. The method diagnoses up to c-.1 faults (c is the connectivity of the system topology). The savings in testing is valid when processor failure rates are low. Environments are also examined with high processor failure rates. It is shown that adopting the proposed methodology for such systems results in greater reliability, while maintaining the same effective processing power.