Formally Verified On-Line Diagnosis

Authors:
Chris J. Walter;Patrick Lincoln;Neeraj Suri
Affiliations:
-;-;-
Venue:
IEEE Transactions on Software Engineering
Year:
1997

Citing 16
Cited 13

Synchronizing clocks in the presence of faults

Journal of the ACM (JACM)
A New Measure for Hybrid Fault Diagnosability

IEEE Transactions on Computers
The MAFT Architecture for Distributed Fault Tolerance

IEEE Transactions on Computers - Fault-Tolerant Computing
Design and validation of computer protocols

Design and validation of computer protocols
On Self-Diagnosable Multiprocessor Systems: Diagnosis by the Comparison Approach

IEEE Transactions on Computers
The consensus problem in fault-tolerant computing

ACM Computing Surveys (CSUR)
A formally verified algorithm for clock synchronization under a hybrid fault model

PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Formal Verification for Fault-Tolerant Architectures: Prolegomena to the Design of PVS

IEEE Transactions on Software Engineering
Reaching Agreement in the Presence of Faults

Journal of the ACM (JACM)
The Byzantine Generals Problem

ACM Transactions on Programming Languages and Systems (TOPLAS)
Advances in ULTRA-Dependable Distributed Systems

Advances in ULTRA-Dependable Distributed Systems
Consensus With Dual Failure Modes

IEEE Transactions on Parallel and Distributed Systems
Formal Verification of Algorithms for Critical Systems

IEEE Transactions on Software Engineering
Mechanical Verification of a Generalized Protocol for Byzantine Fault Tolerant Clock Synchronization

Proceedings of the Second International Symposium on Formal Techniques in Real-Time and Fault-Tolerant Systems
Reconfiguration and transient recovery in state machine architectures

FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Distributed fault-tolerance for large multiprocessor systems

ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture

Systematic Formal Verification for Fault-Tolerant Time-Triggered Algorithms

IEEE Transactions on Software Engineering
Automatic Analysis of Consistency between Requirements and Designs

IEEE Transactions on Software Engineering
The customizable fault/error model for dependable distributed systems

Theoretical Computer Science - Dependable computing
How to Model Link Failures: A Perception-Based Fault Model

DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Modular Composition of Redundancy Management Protocols in Distributed Systems: An Outlook on Simplifying Protocol Level Formal Specification and Verification

ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
How to reconcile fault-tolerant interval intersection with the Lipschitz condition

Distributed Computing
Time-Constrained Failure Diagnosis in Distributed Embedded Systems: Application to Actuator Diagnosis

IEEE Transactions on Parallel and Distributed Systems
A Maintenance-Oriented Fault Model for the DECOS Integrated Diagnostic Architecture

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 2 - Volume 03
Online Diagnosis and Recovery: On the Choice and Impact of Tuning Parameters

IEEE Transactions on Dependable and Secure Computing
Heartbeat based fault diagnosis for mobile ad-hoc network

ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology
The CRUTIAL Architecture for Critical Information Infrastructures

Architecting Dependable Systems V
Sensor deployment for failure diagnosis in networked aerial robots: a satisfiability-based approach

SAT'07 Proceedings of the 10th international conference on Theory and applications of satisfiability testing
Runtime verification in context: can optimizing error detection improve fault diagnosis?

RV'10 Proceedings of the First international conference on Runtime verification

Quantified Score

Hi-index	0.00

Visualization

Abstract

A reconfigurable fault tolerant system achieves the attributes of dependability of operations through fault detection, fault isolation and reconfiguration, typically referred to as the FDIR paradigm. Fault diagnosis is a key component of this approach, requiring an accurate determination of the health and state of the system. An imprecise state assessment can lead to catastrophic failure due to an optimistic diagnosis, or conversely, result in underutilization of resources because of a pessimistic diagnosis. Differing from classical testing and other off-line diagnostic approaches, we develop procedures for maximal utilization of the system state information to provide for continual, on-line diagnosis and reconfiguration capabilities as an integral part of the system operations. Our diagnosis approach, unlike existing techniques, does not require administered testing to gather syndrome information but is based on monitoring the system message traffic among redundant system functions. We present comprehensive on-line diagnosis algorithms capable of handling a continuum of faults of varying severity at the node and link level. Not only are the proposed algorithms on-line in nature, but are themselves tolerant to faults in the diagnostic process. Formal analysis is presented for all proposed algorithms. These proofs offer both insight into the algorithm operations and facilitate a rigorous formal verification of the developed algorithms.