Algorithms for testing fault-tolerance of sequenced jobs

Authors:
Marek Chrobak;Mathilde Hurand;Jiří Sgall
Affiliations:
Department of Computer Science, University of California, Riverside, USA;Department d'Informatique (LIX), Ecole Polytechnique, Palaiseau, France;Department of Applied Mathematics, Charles University, Praha 1, Czech Republic 11800
Venue:
Journal of Scheduling
Year:
2009

Citing 10
Cited 0

Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
A Fault-Tolerant Dynamic Scheduling Algorithm for Multiprocessor Real-Time Systems and Its Analysis

IEEE Transactions on Parallel and Distributed Systems
Fault-tolerant RT-Mach (FT-RT-Mach) and an application to real-time train control

Software—Practice & Experience
Tolerance to Multiple Transient Faults for Aperiodic Tasks in Hard Real-Time Systems

IEEE Transactions on Computers
Fault-Tolerant Real-Time Scheduling

ESA '97 Proceedings of the 5th Annual European Symposium on Algorithms
Lower bounds for algebraic computation trees

STOC '83 Proceedings of the fifteenth annual ACM symposium on Theory of computing
Enhancing real-time schedules to tolerate transient faults

RTSS '95 Proceedings of the 16th IEEE Real-Time Systems Symposium
An Efficient Fault-Tolerant Scheduling Algorithm for Real-Time Tasks with Precedence Constraints in Heterogeneous Systems

ICPP '02 Proceedings of the 2002 International Conference on Parallel Processing
A Nonpreemptive Real-Time Scheduler with Recovery from Transient Faults and Its Implementation

IEEE Transactions on Software Engineering
On Fault-Sensitive Feasibility Analysis of Real-Time Task Sets

RTSS '04 Proceedings of the 25th IEEE International Real-Time Systems Symposium

Quantified Score

Hi-index	0.01

Visualization

Abstract

We study the problem of testing whether a given set of sequenced jobs can tolerate transient faults. We present efficient algorithms for this problem in several fault models. A fault model describes what types of faults are allowed and specifies assumptions on their frequency. Two types of faults are considered: hidden faults, that can only be detected after a job completes, and exposed faults, that can be detected immediately.First, we give an O(n)-time fault-tolerance testing algorithm, for both exposed and hidden faults, if the number of faults does not exceed a given parameter k.Then we consider the model in which any two faults are separated in time by a gap of length at least Δ, where Δ is at least twice the maximum job length. For exposed faults, we give an O(n)-time algorithm. For hidden faults, we give an algorithm with running time O(n 2), and we prove that if job lengths are distributed uniformly over an interval [0,p max驴], then this algorithm's expected running time is O(n). Our experimental study shows that this linear-time performance extends to other distributions. Finally, we provide evidence that improving the worst-case performance may not be possible, by proving an 驴(n 2) lower bound, in the algebraic computation tree model, on a slight generalization of this problem.