Synchronizing clocks in the presence of faults
Journal of the ACM (JACM)
Communications of the ACM - Special section on computer architecture
Using Time Instead of Timeout for Fault-Tolerant Distributed Systems.
ACM Transactions on Programming Languages and Systems (TOPLAS)
A fully distributed (minimal) spanning tree algorithm
Information Processing Letters
How to prevent circuit zapping
IEEE Spectrum
Scientific American
Distributed fault-tolerance of tree structures
IEEE Transactions on Computers
Dynamic Testing Strategy for Distributed Systems
IEEE Transactions on Computers
IEEE Transactions on Computers
A Distributed Algorithm for Minimum-Weight Spanning Trees
ACM Transactions on Programming Languages and Systems (TOPLAS)
On Self-Fault Diagnosis of the Distributed Systems
IEEE Transactions on Computers
Distributed fault-tolerance for large multiprocessor systems
ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
Graphs and Hypergraphs
A partitioning method for efficient system-level diagnosis
Journal of Systems and Software
Hi-index | 14.98 |
Multiple processor systems allow both highly reliable and highly fast service. Distributed self-test algorithms that attempt to improve both the reliability and the performance of these systems are proposed. In these algorithms, reliability is improved by considering the distributed mode of control and assigning processors to test each other periodically for the diagnosis and isolation of the faulty processors and interprocessor links. Meanwhile, performance is improved by considering a dynamic testing strategy and minimizing testing overhead by reducing the number of tests performed on each processor. Simulation results show the effectiveness of the algorithms.