Semantic anomaly detection in online data sources
Proceedings of the 24th International Conference on Software Engineering
Component Failure Mitigation According to Failure Type
COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Volume 01
Detection of anomalies in software architecture with connectors
Science of Computer Programming - Special issue on quality system and software architectures
Exploring recovery from operating system lockups
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Dynamically Detecting Faults via Integrity Constraints
Methods, Models and Tools for Fault Tolerance
CuriOS: improving reliability through operating system structure
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Using allopoietic agents in replicated software to respond to errors, faults, and attacks
Proceedings of the 48th Annual Southeast Regional Conference
Error detection framework for complex software systems
EWDC '11 Proceedings of the 13th European Workshop on Dependable Computing
Application of software health management techniques
Proceedings of the 6th International Symposium on Software Engineering for Adaptive and Self-Managing Systems
An SPL approach for adaptive fault tolerance in SOA
Proceedings of the 15th International Software Product Line Conference, Volume 2
Non-intrusive system level fault-tolerance
Ada-Europe'05 Proceedings of the 10th Ada-Europe international conference on Reliable Software Technologies
Exception handling in the choices operating system
Advanced Topics in Exception Handling Techniques
A metabolic approach to protocol resilience
WAC'04 Proceedings of the First international IFIP conference on Autonomic Communication
A systematic review of design diversity-based solutions for fault-tolerant SOAs
Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering
Deliberative, search-based mitigation strategies for model-based software health management
Innovations in Systems and Software Engineering
Hi-index | 0.00 |
Because of our present inability to produce error-free software, software fault tolerance is and will continue to be an important consideration in software systems. The root cause of software design errors is the complexity of the systems. Compounding the problems in building correct software is the difficulty in assessing the correctness of software for highly complex systems. After a brief overview of the software development processes, we note how hard-to-detect design faults are likely to be introduced during development and how software faults tend to be state-dependent and activated by particular input sequences. Although component reliability is an important quality measure for system level analysis, software reliability is hard to characterize and the use of post-verification reliability estimates remains a controversial issue. For some applications software safety is more important than reliability, and fault tolerance techniques used in those applications are aimed at preventing catastrophes. Single version software fault tolerance techniques discussed include system structuring and closure, atomic actions, inline fault detection, exception handling, and others. Multiversion techniques are based on the assumption that software built differently should fail differently and thus, if one of the redundant versions fails, it is expected that at least one of the other versions will provide an acceptable output. Recovery blocks, N-version programming, and other multiversion techniques are reviewed.