Non-intrusive system level fault-tolerance

Authors:
Kristina Lundqvist;Jayakanth Srinivasan;Sébastien Gorelov
Affiliations:
Embedded Systems Laboratory, Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, MA;Embedded Systems Laboratory, Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, MA;Embedded Systems Laboratory, Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, MA
Venue:
Ada-Europe'05 Proceedings of the 10th Ada-Europe international conference on Reliable Software Technologies
Year:
2005

Citing 11
Cited 1

Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment

Journal of the ACM (JACM)
The Ravenscar Profile

ACM SIGAda Ada Letters
Introduction to Automata Theory, Languages and Computability

Introduction to Automata Theory, Languages and Computability
Resource Management in Real-Time Systems and Networks

Resource Management in Real-Time Systems and Networks
Fault Tolerance: Principles and Practice

Fault Tolerance: Principles and Practice
How to Verify a Safe Real-Time System: The Application of Model Checking and Timed Automata to the Production Cell Case Study

Real-Time Systems
Automata For Modeling Real-Time Systems

ICALP '90 Proceedings of the 17th International Colloquium on Automata, Languages and Programming
Implementing and Using Execution Time Clocks in Ada Hard Real-Time Applications

Ada-Europe '98 Proceedings of the 1998 Ada-Europe International Conference on Reliable Software Technologies
Software Fault Tolerance: A Tutorial

Software Fault Tolerance: A Tutorial
Execution-time clocks and Ravenscar kernels

IRTAW '03 Proceedings of the 12th international workshop on Real-time Ada
Mode Change Protocols for Real-Time Systems: A Survey and a New Proposal

Real-Time Systems

Handling temporal faults in Ada 2005

Ada-Europe'07 Proceedings of the 12th international conference on Reliable software technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-integrity embedded systems operate in multiple modes, in order to ensure system availability in the face of faults. Unanticipated state-dependent faults that remain in software after system design and development behave like hardware transient faults: they appear, do the damage and disappear. The conventional approach used for handling task overruns caused by transient faults is to use a single recovery task that implements minimal functionality. This approach provides limited availability and should be used as a last resort in order to keep the system online. Traditional fault detection approaches are often intrusive in that they consume processor resources in order to monitor system behavior. This paper presents a novel approach for fault-monitoring by leveraging the Ravenscar profile, model-checking and a system-on-chip implementation of both the kernel and an execution time monitor. System fault-tolerance is provided through a hierarchical set of operational modes that are based on timing behavior violations of individual tasks within the application. The approach is illustrated through a simple case study of a generic navigation system.