A program structure for error detection and recovery
Operating Systems, Proceedings of an International Symposium
Sheaved memory: architectural support for state saving and restoration in pages systems
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
An Environment for Developing Fault-Tolerant Software
IEEE Transactions on Software Engineering
Providing fault-tolerant services to distributed Ada 95 applications
Proceedings of the conference on TRI-Ada '96: disciplined software development with Ada
An adaptable and distributed load adjustment algorithm
CSC '88 Proceedings of the 1988 ACM sixteenth annual conference on Computer science
Reliability Issues in Computing System Design
ACM Computing Surveys (CSUR)
A Survey of Techniques for Synchronization and Recovery in Decentralized Computer Systems
ACM Computing Surveys (CSUR)
Implementing recovery blocks in GNAT: a powerful fault tolerance mechanism and a transaction support
Proceedings of the conference on TRI-Ada '95: Ada's role in global markets: solutions for a changing complex world
Recovery and crash resistance in a filing system
SIGMOD '77 Proceedings of the 1977 ACM SIGMOD international conference on Management of data
Implementation of the Conversation Scheme in Message-Based Distributed Computer Systems
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Software Engineering
Structure of an efficient duplex memory for processing fault-tolerant programs
ISCA '78 Proceedings of the 5th annual symposium on Computer architecture
Process backup in producer-consumer systems
SOSP '77 Proceedings of the sixth ACM symposium on Operating systems principles
Software reliability: The role of programmed exception handling
Proceedings of an ACM conference on Language design for reliable software
Improving availability with recursive microreboots: a soft-state system case study
Performance Evaluation - Dependable systems and networks-performance and dependability symposium (DSN-PDS) 2002: Selected papers
A Recovery Cache for the PDP-11
IEEE Transactions on Computers
Bristlecone: A Language for Robust Software Systems
ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
Runtime Verification
Achieving software robustness via large-scale multiagent systems
Software engineering for large-scale multi-agent systems
Recovery tasks: an automated approach to failure recovery
RV'10 Proceedings of the First international conference on Runtime verification
Transactions: from local atomicity to atomicity in the cloud
Dependable and Historic Computing
Exception handlers for healing component-based systems
ACM Transactions on Software Engineering and Methodology (TOSEM) - Testing, debugging, and error handling, formal methods, lifecycle concerns, evolution and maintenance
Hi-index | 0.00 |
The need for reliable complex systems motivates the development of techniques by which acceptable service can be maintained, even in the presence of residual errors. Recovery blocks allow a software designer to include tests on the acceptability of the various phases of a system's operation, and to specify alternative actions should the acceptance tests fail. This approach relies on certain architectural features, ideally implemented in hardware, by which control and data structures can be retrieved after errors. A brief account is presented of the recovery block scheme, together with a description of a new implementation of the underlying cache mechanism. The salient features of a proposed computer architecture are described, which incorporates this implementation and also provides a high level of detection for errors such as the corruption of code and data. A prototype system has been constructed to test the viability of these techniques by executing programs containing recovery blocks on an emulator for the proposed architecture. Experiences in running this system are recounted with respect to the execution of programs based on erroneous algorithms and also with respect to errors introduced by deliberate attempts to corrupt the system.