Design and principles of a fault tolerant system
ICSE '78 Proceedings of the 3rd international conference on Software engineering
Role-based authorization in decentralized health care environments
Proceedings of the 2003 ACM symposium on Applied computing
An adaptive scheme for fault-tolerant scheduling of soft real-time tasks in multiprocessor systems
Journal of Parallel and Distributed Computing
Quasi-atomic recovery for distributed agents
Parallel Computing
Applying aspects to a real-time embedded operating system
Proceedings of the 6th workshop on Aspects, components, and patterns for infrastructure software
Journal of Systems and Software
A mobile agent platform for distributed network and systems management
Journal of Systems and Software
N-version programming with imperfect debugging
Computers and Electrical Engineering
A mechanism for exception handling and its verification rules
Computer Languages
Computer Languages
Fail-safety techniques and their extensions to concurrent systems
Computer Languages
Research: Supporting fault-tolerant and open distributed processing using RPC
Computer Communications
Optimal checkpointing interval of a communication system with rollback recovery
Mathematical and Computer Modelling: An International Journal
Journal of Systems and Software
Availability analysis for the design of distributed processing networks
Journal of Systems and Software
Microprocessors & Microsystems
A multi-cycle checkpointing protocol that ensures strict 1-rollback
Information Processing Letters
ChameleonSoft: Software Behavior Encryption for Moving Target Defense
Mobile Networks and Applications
Recovery within long-running transactions
ACM Computing Surveys (CSUR)
The Journal of Supercomputing
Supporting undoability in systems operations
LISA'13 Proceedings of the 27th international conference on Large Installation System Administration
Hi-index | 0.00 |
This paper presents and discusses the rationale behind a method for structuring complex computing systems by the use of what we term "recovery blocks," "conversations," and "fault" tolerant interfaces.' The aim is to facilitate the provision of dependable error detection and recovery facilities which can cope with errors caused by residual design inadequacies, particularly in the system software, rather than merely the occasional malfunctioning of hardware components.