Measurement-Based Analysis of System Dependability Using Fault Injection and Field Failure Data
Performance Evaluation of Complex Systems: Techniques and Tools, Performance 2002, Tutorial Lectures
IEEE Transactions on Software Engineering
Improving availability with recursive microreboots: a soft-state system case study
Performance Evaluation - Dependable systems and networks-performance and dependability symposium (DSN-PDS) 2002: Selected papers
A system model for dynamically reconfigurable software
IBM Systems Journal
Autonomous recovery in componentized Internet applications
Cluster Computing
Information Assurance: Dependability and Security in Networked Systems
Information Assurance: Dependability and Security in Networked Systems
COTS-based applications in space avionics
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
This paper presents an experimental evaluation of a software-implemented fault tolerance (SIFT) environment built around a set of self-checking processes called ARMORs running on different machines that provide error detection and recovery services to themselves and to spaceborne scientific applications. The experiments are split into three groups of error injections, with each group successively stressing the SIFT error detection and recovery more than the previous group. The results show that the SIFT environment adds negligible overhead to the application during failure-free runs. Only 11 cases were observed in which either the application failed to start or the SIFT environment failed to recognize that the application had completed. Further investigations showed that assertions within the SIFT processes 驴 coupled with object-based incremental checkpointing 驴 were effective in preventing system failures by protecting dynamic data within the SIFT processes.