Distributed fault-tolerance for large multiprocessor systems
ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
Design and simulation of the distributed loop computer network (DLCN)
ISCA '76 Proceedings of the 3rd annual symposium on Computer architecture
A large scale, homogeneous, fully distributed parallel machine, I
ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
Fault tolerance in distributed computing systems and databases
Fault tolerance in distributed computing systems and databases
Efficient Distributed Algorithms for Self Testing of Multiple Processor Systems
IEEE Transactions on Computers
Probabilistic diagnosis of multiprocessor systems
ACM Computing Surveys (CSUR)
Hi-index | 14.98 |
The problem of achieving fault diagnosis in a network of interconnected processing elements (called nodes) is considered. It is assumes that there is no central facility to control, coordinate or mediate among the processing elements. Every node can eventually determine the status of nodes and communication paths between them. A diagnostic algorithm for homogeneous systems (systems with only testing nodes) is given. The self-fault-diagnosis of inhomogeneous systems (systems with nodes of varying degrees of testing capability) is studied and diagnostic algorithms are proposed.