Software Fault Tolerance: A Tutorial

Authors:
Torres Wilfredo
Affiliations:
-
Venue:
Software Fault Tolerance: A Tutorial
Year:
2000

Citing 0
Cited 16

Semantic anomaly detection in online data sources

Proceedings of the 24th International Conference on Software Engineering
Component Failure Mitigation According to Failure Type

COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Volume 01
Detection of anomalies in software architecture with connectors

Science of Computer Programming - Special issue on quality system and software architectures
Exploring recovery from operating system lockups

ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Dynamically Detecting Faults via Integrity Constraints

Methods, Models and Tools for Fault Tolerance
CuriOS: improving reliability through operating system structure

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Using allopoietic agents in replicated software to respond to errors, faults, and attacks

Proceedings of the 48th Annual Southeast Regional Conference
Error detection framework for complex software systems

EWDC '11 Proceedings of the 13th European Workshop on Dependable Computing
Application of software health management techniques

Proceedings of the 6th International Symposium on Software Engineering for Adaptive and Self-Managing Systems
An SPL approach for adaptive fault tolerance in SOA

Proceedings of the 15th International Software Product Line Conference, Volume 2
Fractionated software for networked cyber-physical systems: research directions and long-term vision

Formal modeling
Non-intrusive system level fault-tolerance

Ada-Europe'05 Proceedings of the 10th Ada-Europe international conference on Reliable Software Technologies
Exception handling in the choices operating system

Advanced Topics in Exception Handling Techniques
A metabolic approach to protocol resilience

WAC'04 Proceedings of the First international IFIP conference on Autonomic Communication
A systematic review of design diversity-based solutions for fault-tolerant SOAs

Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering
Deliberative, search-based mitigation strategies for model-based software health management

Innovations in Systems and Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Because of our present inability to produce error-free software, software fault tolerance is and will continue to be an important consideration in software systems. The root cause of software design errors is the complexity of the systems. Compounding the problems in building correct software is the difficulty in assessing the correctness of software for highly complex systems. After a brief overview of the software development processes, we note how hard-to-detect design faults are likely to be introduced during development and how software faults tend to be state-dependent and activated by particular input sequences. Although component reliability is an important quality measure for system level analysis, software reliability is hard to characterize and the use of post-verification reliability estimates remains a controversial issue. For some applications software safety is more important than reliability, and fault tolerance techniques used in those applications are aimed at preventing catastrophes. Single version software fault tolerance techniques discussed include system structuring and closure, atomic actions, inline fault detection, exception handling, and others. Multiversion techniques are based on the assumption that software built differently should fail differently and thus, if one of the redundant versions fails, it is expected that at least one of the other versions will provide an acceptable output. Recovery blocks, N-version programming, and other multiversion techniques are reviewed.