A fault identification algorithm for ti-diagnosable systems
IEEE Transactions on Computers - The MIT Press scientific computation series
Reaching Agreement in the Presence of Faults
Journal of the ACM (JACM)
Graph Algorithms
Distributed fault-tolerance for large multiprocessor systems
ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
Hardware-Assisted Software Clock Synchronization for Homogeneous Distributed Systems
IEEE Transactions on Computers
Reliable broadcast algorithms for HARTS
ACM Transactions on Computer Systems (TOCS)
Software fault isolation in wide area networks
CSC '92 Proceedings of the 1992 ACM annual conference on Communications
The consensus problem in fault-tolerant computing
ACM Computing Surveys (CSUR)
Probabilistic diagnosis of multiprocessor systems
ACM Computing Surveys (CSUR)
All-to-All Broadcasting in Faulty Hypercubes
IEEE Transactions on Computers
Interleaved All-to-All Reliable Broadcast on Meshes and Hypercubes
IEEE Transactions on Parallel and Distributed Systems
A flexible formal framework for masking/demasking faults
Information Sciences—Informatics and Computer Science: An International Journal
A distributed algorithm of fault recovery for stateful failover
TAMC'07 Proceedings of the 4th international conference on Theory and applications of models of computation
Hi-index | 14.99 |
The problem of diagnosis of soft failures at the system level in large and fully distributed networks of processors (or units) is considered. A system model in which each of the network's units is assumed to possess the ability to test (or evaluate) certain other units for the presence of failures is employed. Using this model and assuming that the total number of faulty units does not exceed a given bound, a distributed algorithm is presented which allows all the fault-free units to independently converge to correct and consistent diagnoses of the system status. This algorithm is also shown to be applicable to bounded fault situations where both units and communication links can be faulty.