Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
A Fault-Tolerant Dynamic Scheduling Algorithm for Multiprocessor Real-Time Systems and Its Analysis
IEEE Transactions on Parallel and Distributed Systems
Fault-tolerant RT-Mach (FT-RT-Mach) and an application to real-time train control
Software—Practice & Experience
Tolerance to Multiple Transient Faults for Aperiodic Tasks in Hard Real-Time Systems
IEEE Transactions on Computers
Fault-Tolerant Real-Time Scheduling
ESA '97 Proceedings of the 5th Annual European Symposium on Algorithms
Lower bounds for algebraic computation trees
STOC '83 Proceedings of the fifteenth annual ACM symposium on Theory of computing
Enhancing real-time schedules to tolerate transient faults
RTSS '95 Proceedings of the 16th IEEE Real-Time Systems Symposium
ICPP '02 Proceedings of the 2002 International Conference on Parallel Processing
A Nonpreemptive Real-Time Scheduler with Recovery from Transient Faults and Its Implementation
IEEE Transactions on Software Engineering
On Fault-Sensitive Feasibility Analysis of Real-Time Task Sets
RTSS '04 Proceedings of the 25th IEEE International Real-Time Systems Symposium
Hi-index | 0.01 |
We study the problem of testing whether a given set of sequenced jobs can tolerate transient faults. We present efficient algorithms for this problem in several fault models. A fault model describes what types of faults are allowed and specifies assumptions on their frequency. Two types of faults are considered: hidden faults, that can only be detected after a job completes, and exposed faults, that can be detected immediately.First, we give an O(n)-time fault-tolerance testing algorithm, for both exposed and hidden faults, if the number of faults does not exceed a given parameter k.Then we consider the model in which any two faults are separated in time by a gap of length at least Δ, where Δ is at least twice the maximum job length. For exposed faults, we give an O(n)-time algorithm. For hidden faults, we give an algorithm with running time O(n 2), and we prove that if job lengths are distributed uniformly over an interval [0,p max驴], then this algorithm's expected running time is O(n). Our experimental study shows that this linear-time performance extends to other distributions. Finally, we provide evidence that improving the worst-case performance may not be possible, by proving an 驴(n 2) lower bound, in the algebraic computation tree model, on a slight generalization of this problem.