The consensus problem in fault-tolerant computing
ACM Computing Surveys (CSUR)
Probabilistic diagnosis of multiprocessor systems
ACM Computing Surveys (CSUR)
Processor Membership in Asynchronous Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
A comparison connection assignment for diagnosis of multiprocessor systems
ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
On the distributed fault diagnosis of computer networks
ISCC '95 Proceedings of the IEEE Symposium on Computers and Communications (ISCC'95)
Graph Theory with Applications to Engineering and Computer Science (Prentice Hall Series in Automatic Computation)
A Diagnosis Algorithm for Distributed Computing Systems with Dynamic Failure and Repair
IEEE Transactions on Computers
Hi-index | 0.00 |
Abstract: Fault management, which consists of fault detection, diagnosis, and recovery is one of the key goals of network management. We consider an application of system-level diagnosis concepts to the problem of fault diagnosis in networks. Since the primary function of the nodes in a communication network is to route messages, the diagnosis is done with respect to the ability of the nodes to accomplish routing correctly. We propose a basic test model in which a node is tested with the help of two other nodes (a tester and a helper node) that check the ability of the tested node to correctly route the messages between the tester and the helper nodes. Necessary conditions are derived for a network to be t-diagnosable under this model. The proposed model does not require tests with perfect coverage.