Error recovery in asynchronous systems
IEEE Transactions on Software Engineering
SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Concepts and experiments in computational reflection
OOPSLA '87 Conference proceedings on Object-oriented programming systems, languages and applications
Data Diversity: An Approach to Software Fault Tolerance
IEEE Transactions on Computers - Fault-Tolerant Computing
The Consistent Comparison Problem in N-Version Software
IEEE Transactions on Software Engineering
Understanding fault-tolerant distributed systems
Communications of the ACM
Implementation of blocking coordinated atomic actions based on forward error recovery
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: dependable parallel computer systems
Concurrent Exception Handling and Resolution in Distributed Object Systems
IEEE Transactions on Parallel and Distributed Systems
The Byzantine Generals Problem
ACM Transactions on Programming Languages and Systems (TOPLAS)
Communications of the ACM
Software fault tolerance techniques and implementation
Software fault tolerance techniques and implementation
Real-Time Systems: Design Principles for Distributed Embedded Applications
Real-Time Systems: Design Principles for Distributed Embedded Applications
Transaction Processing: Concepts and Techniques
Transaction Processing: Concepts and Techniques
Fault Tolerance: Principles and Practice
Fault Tolerance: Principles and Practice
Split-Transactions for Open-Ended Activities
VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
AOP: Does It Make Sense? The Case of Concurrency and Failures
ECOOP '02 Proceedings of the 16th European Conference on Object-Oriented Programming
On Programming Atomic Actions in Ada 95
Ada-Europe '97 Proceedings of the 1997 Ada-Europe International Conference on Reliable Software Technologies
A program structure for error detection and recovery
Operating Systems, Proceedings of an International Symposium
Open Multithreaded Transactions: Keeping Threads and Exceptions under Control
WORDS '01 Proceedings of the Sixth International Workshop on Object-Oriented Real-Time Dependable Systems (WORDS'01)
Fault Tolerance in Concurrent Object-Oriented Software through Coordinated Error Recovery
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Auction System Design Using Open Multithreaded Transactions
WORDS '02 Proceedings of the The Seventh IEEE International Workshop on Object-Oriented Real-Time Dependable Systems (WORDS 2002)
The N-Version Approach to Fault-Tolerant Software
IEEE Transactions on Software Engineering
Quarantine: fault tolerance for concurrent servers with data-driven selective isolation
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
A survey of software development approaches addressing dependability
FIDJI'04 Proceedings of the 4th international conference on Scientific Engineering of Distributed Java Applications
Hi-index | 0.00 |
This paper presents an overview of the techniques that can be used by developers to produce software that can tolerate design faults and faults of the surrounding environment. After reviewing the basic terms and concepts of fault tolerance, the most well-known fault-tolerance techniques exploiting software-, information- and time redundancy are presented, classified according to the kind of concurrency they support.