The Comparison Approach to Multiprocessor Fault Diagnosis
IEEE Transactions on Computers
Introduction to algorithms
Diagnosing Arbitrarily Connected Parallel Computers with High Probability
IEEE Transactions on Computers - Special issue on fault-tolerant computing
Implementation of Online Distributed System-Level Diagnosis Theory
IEEE Transactions on Computers - Special issue on fault-tolerant computing
Fail-stop processors: an approach to designing fault-tolerant computing systems
ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Distributed fault-tolerance for large multiprocessor systems
ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
An efficient multicast protocol for PCS networks
Mobile Networks and Applications - Special issue on personal communications services
A Hierarchical Adaptive Distributed System-Level Diagnosis Algorithm
IEEE Transactions on Computers
A partitioning method for efficient system-level diagnosis
Journal of Systems and Software
Distributed off-line testing of parallel systems
ATS '95 Proceedings of the 4th Asian Test Symposium
SRDS '96 Proceedings of the 15th Symposium on Reliable Distributed Systems
A Fault-Tolerant Protocol for Location Directory Maintenance in Mobile Networks
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Distributed Diagnosis in Dynamic Fault Environments
IEEE Transactions on Parallel and Distributed Systems
A flexible formal framework for masking/demasking faults
Information Sciences—Informatics and Computer Science: An International Journal
Heartbeat based fault diagnosis for mobile ad-hoc network
ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology
Distributed testing and diagnosis in a mobile computing environment
Proceedings of the 6th International Wireless Communications and Mobile Computing Conference
A survey of comparison-based system-level diagnosis
ACM Computing Surveys (CSUR)
Scalable and fault tolerant multiple tuple space architecture for mobile agent communication
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
System-level fault diagnosis in fixed topology mobile ad hoc networks
International Journal of Communication Networks and Distributed Systems
Hi-index | 0.00 |
In this paper, a distributed algorithm is described for detecting and diagnosing faulty processors in an arbitrary network. Fault-free processors perform simple periodic tests on one another; when a fault is detected or a newly-repaired processor joins the network, this new information is disseminated $\mbi{in}$$\mbi{parallel}$ throughout the network. It is formally proven that the algorithm is correct, and it is also shown that the algorithm is optimal in terms of the time required for all of the fault-free processors in the network to learn of a new event. Simulation results are given for arbitrary network topologies.Index Terms驴Computer fault diagnosis, computer fault tolerance, computer networks, distributed computing, system-level fault diagnosis, distributed algorithm, fault detection.