A fault identification algorithm for ti-diagnosable systems
IEEE Transactions on Computers - The MIT Press scientific computation series
The Comparison Approach to Multiprocessor Fault Diagnosis
IEEE Transactions on Computers
A Generalized Theory for System Level Diagnosis
IEEE Transactions on Computers
Almost sure fault tolerance in random graphs
SIAM Journal on Computing
Efficient Diagnosis of Multiprocessor Systems Under Probabilistic Models
IEEE Transactions on Computers
Fault detection and diagnosis in multiprocessor systems
Fault detection and diagnosis in multiprocessor systems
Efficient randomized pattern-matching algorithms
IBM Journal of Research and Development - Mathematics and computing
The consensus problem in fault-tolerant computing
ACM Computing Surveys (CSUR)
Probabilistic diagnosis of multiprocessor systems
ACM Computing Surveys (CSUR)
Globally Optimal Diagnosis in Systems with Random Faults
IEEE Transactions on Computers
Optimal Diagnosis of Heterogeneous Systems with Random Faults
IEEE Transactions on Computers
GUARDS: A Generic Upgradable Architecture for Real-Time Dependable Systems
IEEE Transactions on Parallel and Distributed Systems
Diagnosis and Repair in Multiprocessor Systems
IEEE Transactions on Computers
Almost Certain Fault Diagnosis Through Algorithm-Based Fault Tolerance
IEEE Transactions on Parallel and Distributed Systems
Reliable Fault Diagnosis with Few Tests
Combinatorics, Probability and Computing
A flexible formal framework for masking/demasking faults
Information Sciences—Informatics and Computer Science: An International Journal
Adapting to intermittent faults in multicore systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Hi-index | 14.99 |
The authors present and analyze a probabilistic model for the self-diagnosis capabilities of a multiprocessor system. In this model an individual processor fails with probability p and a nonfaulty processor testing a faulty processor detects a fault with probability q. This models the situation where processors can be intermittently faulty or the situation where tests are not capable of detecting all possible faults within a processor. An efficient algorithm that can achieve correct diagnosis with high probability in systems of O(n log n) connections, where n is the number of processors, is presented. It is the first algorithm to be able to diagnose a large number of intermittently faulty processors in a class of systems that includes hypercubes. It is shown that, under this model, no algorithm can achieve correct diagnosis with high probability in regular systems which conduct a number of tests dominated by n log n. Examples of systems which perform a modest number of tests are given in which the probability of correct diagnosis for the algorithm is very nearly one.