Optimal Diagnosis of Heterogeneous Systems with Random Faults

Authors:
Andrzej Pelc
Affiliations:
Univ. du Québec à Hull, Québec
Venue:
IEEE Transactions on Computers
Year:
1998

Citing 14
Cited 1

How to assign votes in a distributed system

Journal of the ACM (JACM)
An Efficient Algorithm for Identifying the Most Likely Fault Set in a Probabilistically Diagnosable System

IEEE Transactions on Computers - The MIT Press scientific computation series
Almost sure fault tolerance in random graphs

SIAM Journal on Computing
Locating faults in a constant number of parallel testing rounds

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Undirected Graph Models for System-Level Fault Diagnosis

IEEE Transactions on Computers
Complexity of Fault Diagnosis in Comparison Models

IEEE Transactions on Computers
Efficient Diagnosis of Multiprocessor Systems Under Probabilistic Models

IEEE Transactions on Computers
Intermittent Fault Diagnosis in Multiprocessor Systems

IEEE Transactions on Computers
Fault diagnosis in a small constant number of parallel testing rounds

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Probabilistic diagnosis of multiprocessor systems

ACM Computing Surveys (CSUR)
Optimal coteries and voting schemes

Information Processing Letters
Globally Optimal Diagnosis in Systems with Random Faults

IEEE Transactions on Computers
Almost Sure Diagnosis of Almost Every Good Element

IEEE Transactions on Computers
Fault diagnosis in a flash

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science

Optimal decision strategies in Byzantine environments

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	14.98

Visualization

Abstract

We consider the problem of fault diagnosis in multiprocessor systems. Processors perform tests on one another; fault-free testers correctly identify the fault status of tested processors, while faulty testers can give arbitrary test results. Processors fail with arbitrary probabilities and all failures are independent. The goal is to identify correctly the status of all processors, based on the set of test results. A diagnosis algorithm is optimal if it has the highest probability of correctness (reliability) among all (deterministic) diagnosis algorithms. We give a fast diagnosis algorithm and prove its optimality for arbitrary values of failure probabilities. This is the first time that optimal diagnosis is given for systems without any assumptions on the behavior of faulty processors or on the values of failure probabilities.We also investigate locally optimal diagnosis algorithms: For any set of test results, they return the most probable configuration of faulty and fault-free processors that could yield it. We show a fast diagnosis which is always locally optimal. If all processors have failure probabilities smaller than ${\textstyle{1 \over 2}},$ a locally optimal diagnosis is proved to be optimal. However, if some processors have failure probabilities exceeding ${\textstyle{1 \over 2}},$ a locally optimal diagnosis need not have the highest reliability. We even show examples that it may have arbitrarily small reliability when the number of processors increases, while optimal reliability remains constant.