Fault-Tolerant Software for Real-Time Applications
ACM Computing Surveys (CSUR)
The architecture of concurrent programs
The architecture of concurrent programs
Proceedings of the First International Conference on Data Engineering
A program structure for error detection and recovery
Operating Systems, Proceedings of an International Symposium
An Approach to Experimental Evaluation of Real-Time Fault-Tolerant Distributed Computing Schemes
IEEE Transactions on Software Engineering
Predictive deadline multi-processing
ACM SIGOPS Operating Systems Review
Performance Analysis of Real-Time Software Supporting Fault-Tolerant Operation
IEEE Transactions on Computers
An annotated bibliography of dependable distributed computing
ACM SIGOPS Operating Systems Review
Journal of Systems Architecture: the EUROMICRO Journal
A Distributed Recovery Block Approach to Fault-Tolerant Execution of Application Tasks in Hypercubes
IEEE Transactions on Parallel and Distributed Systems
Software Dependability in the Tandem GUARDIAN System
IEEE Transactions on Software Engineering
CSP Methods for Identifying Atomic Actions in the Design of Fault Tolerant Concurrent Systems
IEEE Transactions on Software Engineering
Design and Verification of Distributed Recovery Blocks with CSP
Formal Methods in System Design
Efficient implementation strategies for the DRB approach in fault-tolerant hypercubes
COMPSAC '97 Proceedings of the 21st International Computer Software and Applications Conference
On the Effect of Recovery Block Scheme on System Performance
COMPSAC '97 Proceedings of the 21st International Computer Software and Applications Conference
An Approach to Software Assisted Recovery from Hardware Transient Faults for Real Time Systems
SAFECOMP '00 Proceedings of the 19th International Conference on Computer Safety, Reliability and Security
Time-bounded cooperative recovery with the distributed real-time conversation scheme
WORDS '97 Proceedings of the 3rd Workshop on Object-Oriented Real-Time Dependable Systems - (WORDS '97)
Design Fault Tolerance in Operating Systems Based on a Standardization Project
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Applying aspects to a real-time embedded operating system
Proceedings of the 6th workshop on Aspects, components, and patterns for infrastructure software
A survey of linguistic structures for application-level fault tolerance
ACM Computing Surveys (CSUR)
Aspect-oriented fault tolerance for real-time embedded systems
Proceedings of the 2008 AOSD workshop on Aspects, components, and patterns for infrastructure software
Achieving software robustness via large-scale multiagent systems
Software engineering for large-scale multi-agent systems
Dependable and Historic Computing
Hi-index | 14.99 |
The concept of distributed execution of recovery blocks is examined as an approach for uniform treatment of hardware and software faults. A useful characteristic of the approach is the relatively small time cost it requires. The approach is thus suitable for incorporation into real-time computer systems. A specific formulation of the approach that is aimed at minimizing the recovery time is presented, called the distributed recovery blocks scheme. The DRB scheme is capable of effecting forward recovery while handling both hardware and software faults in a uniform manner. An approach to incorporating the capability for distributed execution of recovery blocks into a load-sharing multiprocessing scheme is also discussed. Two experiments aimed at testing the execution efficiency of the scheme in real-time applications have been conducted on two different multimicrocomputer networks. The results clearly indicate the feasibility of achieving tolerance of hardware and software faults.