A Nonpreemptive Real-Time Scheduler with Recovery from Transient Faults and Its Implementation

Authors:
Daniel Mossé;Rami Melhem;Sunondo Ghosh
Affiliations:
-;-;-
Venue:
IEEE Transactions on Software Engineering
Year:
2003

Citing 17
Cited 6

A fault-tolerant scheduling problem

IEEE Transactions on Software Engineering
On Scheduling Tasks with a Quick Recovery from Failure

IEEE Transactions on Computers
The fault-tolerant multiprocessor computer

The fault-tolerant multiprocessor computer
Preemptive scheduling under time and resource constraints

IEEE Transactions on Computers - Special Issue on Real-Time Systems
Simple and integrated heuristic algorithms for scheduling tasks with time and resource constraints

Journal of Systems and Software
Misconceptions About Real-Time Computing: A Serious Problem for Next-Generation Systems

Computer
Design & analysis of fault tolerant digital systems

Design & analysis of fault tolerant digital systems
An Approach to the Reliability Optimization of Software with Redundancy

IEEE Transactions on Software Engineering
Enhancing fault-tolerance in rate-monotonic scheduling

Real-Time Systems - Special issue on responsive computer systems
Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
Fault-Tolerant Rate-Monotonic Scheduling

Real-Time Systems
Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment

Journal of the ACM (JACM)
Distributed Fault-Tolerant Real-Time Systems: The Mars Approach

IEEE Micro
Event-Triggered Versus Time-Triggered Real-Time Systems

Proceedings of the International Workshop on Operating Systems of the 90s and Beyond
Time Slot Allocation for Real-Time Messages with Negotiable Distance Constrains

RTAS '98 Proceedings of the Fourth IEEE Real-Time Technology and Applications Symposium
A new fault-tolerant scheduling technique for real-time multiprocessor systems

RTCSA '95 Proceedings of the 2nd International Workshop on Real-Time Computing Systems and Applications
A framework for the development and deployment of fault-tolerant applications in real-time systems

A framework for the development and deployment of fault-tolerant applications in real-time systems

Soft-error classification and impact analysis on real-time operating systems

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Implementing fault-tolerance in real-time systems by automatic program transformations

EMSOFT '06 Proceedings of the 6th ACM & IEEE International conference on Embedded software
Fault-tolerance in the borealis distributed stream processing system

ACM Transactions on Database Systems (TODS)
Implementing fault-tolerance in real-time programs by automatic program transformations

ACM Transactions on Embedded Computing Systems (TECS)
Algorithms for testing fault-tolerance of sequenced jobs

Journal of Scheduling
Incremental synthesis of fault-tolerant real-time programs

SSS'06 Proceedings of the 8th international conference on Stabilization, safety, and security of distributed systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Real-time systems (RTS) are those whose correctness depends on satisfying the required functional as well as the required temporal properties. Due to the criticality of such systems, recovery from faults is an essential part of a RTS. In many systems, such as those supporting space applications, single event upsets (SEUs) are the prevalent type of faults; SEUs are transient faults and affect a single task at a time. This paper presents a scheme to guarantee that the execution of real-time tasks can tolerate SEUs and intermittent faults assuming any queue-based scheduling technique. Three algorithms are presented to solve the problem of adding fault tolerance to a queue of real-time tasks by reserving sufficient slack in a schedule so that recovery can be carried out before the task deadline without compromising guarantees given to other tasks. The first algorithm is a dynamic programming optimal solution, the second is a linear-time heuristic for scheduling dynamic tasks, and the third algorithm comprises extensions to address queues with gaps between tasks (gaps are caused by precedence, resource, or timing constraints). We show through simulations that the heuristics closely approximate the optimal algorithm. Finally, the paper describes the implementation of the modified admission control algorithm, the nonpreemptive scheduler, and a recovery mechanism in the FT-RT-Mach operating system.