Issues on the design of efficient fail-safe fault tolerance

Authors:
Arshad Jhumka;Matt Leeke
Affiliations:
Department of Computer Science, University of Warwick, Coventry, UK;Department of Computer Science, University of Warwick, Coventry, UK
Venue:
ISSRE'09 Proceedings of the 20th IEEE international conference on software reliability engineering
Year:
2009

Citing 11
Cited 0

The Use of Self Checks and Voting in Software Error Detection: An Empirical Study

IEEE Transactions on Software Engineering
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
Dependability: Basic Concepts and Terminology

Dependability: Basic Concepts and Terminology
Stepwise Development of Fault-Tolerant Reactive Systems

ProCoS Proceedings of the Third International Symposium Organized Jointly with the Working Group Provably Correct Systems on Formal Techniques in Real-Time and Fault-Tolerant Systems
Verifying Fault Tolerance of Distributed Algorithms Formally - An Example

CSD '98 Proceedings of the 1998 International Conference on Application of Concurrency to System Design
Detectors and Correctors: A Theory of Fault-Tolerance Components

ICDCS '98 Proceedings of the The 18th International Conference on Distributed Computing Systems
The Complexity of Adding Failsafe Fault-Tolerance

ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
Component based design of fault-tolerance

Component based design of fault-tolerance
Proving the Correctness of Multiprocess Programs

IEEE Transactions on Software Engineering
An approach to synthesise safe systems

International Journal of Security and Networks
A framework of safe stabilization

SSS'03 Proceedings of the 6th international conference on Self-stabilizing systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The design of a fault-tolerant program is known to be an inherently difficult task. Decisions taken during the design process will invariably have an impact on the efficiency of the resulting fault-tolerant program. In this paper, we focus on two such decisions, namely (i) the class of faults the program is to tolerate, and (ii) the variables that can be read and written. The impact these design issues have on the overall fault tolerance of the system needs to be well-understood, failure of which can lead to costly redesigns. For the case of understanding the impact of fault classes on the efficiency of fail-safe fault tolerance, we show that, under the assumption of a general fault model, it is impossible to preserve the original behavior of the fault-intolerant program. For the second problem of read and write constraints of variables, we again show that it is impossible to preserve the original behavior of the fault-intolerant program. We analyze the reasons that lead to these impossibility results, and suggest possible ways of circumventing them.