Exploiting redundancies to enhance schedulability in fault-tolerant and real-time distributed systems

Authors:
Wei Luo;Xiao Qin;Xian-Chun Tan;Ke Qin;Adam Manzanares
Affiliations:
Department of Information System, China Ship Development and Design Center, Wuhan, China;Department of Computer Science and Software Engineering, Auburn University, Auburn, AL;Department of Information System, China Ship Development and Design Center, Wuhan, China;Department of Information System, China Ship Development and Design Center, Wuhan, China;Department of Computer Science and Software Engineering, Auburn University, Auburn, AL
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Year:
2009

Citing 30
Cited 0

A fault-tolerant scheduling problem

IEEE Transactions on Software Engineering
On Scheduling Tasks with a Quick Recovery from Failure

IEEE Transactions on Computers
Operating-system enhancements for a fault-tolerant dual-processor structure for the control of an industrial process

Software—Practice & Experience
Design & analysis of fault tolerant digital systems

Design & analysis of fault tolerant digital systems
Rate-Monotonic Analysis for Real-Time Industrial Computing

Computer
Handbook of software reliability engineering

Handbook of software reliability engineering
Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
A Fault-Tolerant Dynamic Scheduling Algorithm for Multiprocessor Real-Time Systems and Its Analysis

IEEE Transactions on Parallel and Distributed Systems
Fault-Tolerant Rate-Monotonic First-Fit Scheduling in Hard-Real-Time Systems

IEEE Transactions on Parallel and Distributed Systems
Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment

Journal of the ACM (JACM)
Tolerance to Multiple Transient Faults for Aperiodic Tasks in Hard Real-Time Systems

IEEE Transactions on Computers
New Strategies for Assigning Real-Time Tasks to Multiprocessor Systems

IEEE Transactions on Computers
A New Fault-Tolerant Technique for Improving the Schedulability in Multiprocessor Real-time Systems

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Heterogeneous Resource Management for Dynamic Real-Time Systems

HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
A new fault-tolerant scheduling technique for real-time multiprocessor systems

RTCSA '95 Proceedings of the 2nd International Workshop on Real-Time Computing Systems and Applications
Optimal Scheduling for Fault-Tolerant and Firm Real-Time Systems

RTCSA '98 Proceedings of the 5th International Conference on Real-Time Computing Systems and Applications
Dual priority scheduling

RTSS '95 Proceedings of the 16th IEEE Real-Time Systems Symposium
An Efficient Fault-Tolerant Scheduling Algorithm for Real-Time Tasks with Precedence Constraints in Heterogeneous Systems

ICPP '02 Proceedings of the 2002 International Conference on Parallel Processing
The Interplay of Power Management and Fault Recovery in Real-Time Systems

IEEE Transactions on Computers
Fault-tolerant scheduling for real-time embedded control systems

Journal of Computer Science and Technology
An adaptive scheme for fault-tolerant scheduling of soft real-time tasks in multiprocessor systems

Journal of Parallel and Distributed Computing
A dynamic and reliability-driven scheduling algorithm for parallel real-time jobs executing on heterogeneous clusters

Journal of Parallel and Distributed Computing
Scheduling Security-Critical Real-Time Applications on Clusters

IEEE Transactions on Computers
A novel fault-tolerant scheduling algorithm for precedence constrained tasks in real-time heterogeneous systems

Parallel Computing
An Availability-Aware Task Scheduling Strategy for Heterogeneous Systems

IEEE Transactions on Computers
Fault-tolerant scheduling based on periodic tasks for heterogeneous systems

ATC'06 Proceedings of the Third international conference on Autonomic and Trusted Computing
Modeling and Analysis of Real-Time Cooperative Systems Using Petri Nets

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Soft Due Window Assignment and Scheduling on Parallel Machines

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Antisocial Behavior of Agents in Scheduling Mechanisms

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
An optimal fixed-priority assignment algorithm for supporting fault-tolerant hard real-time systems

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the past decades, distributed systems have been widely applied to real-time applications, most of which have fault-tolerance requirements to assure high reliability. Due to the stringent space constraints of real-time systems, the issue of schedulability becomes a major concern in the design of fault-tolerant and real-time distributed systems. Most existing real-time and fault-tolerant scheduling algorithms, which are based on the primary-backup scheme for periodic real-time tasks, introduce unnecessary redundancies by aggressively using active-backup copies. To solve this problem, we propose two novel fault-tolerant techniques, which are seamlessly integrated with fixed-priority-based scheduling algorithms. These techniques leverage redundancies to enhance schedulability in fault-tolerant and real-time distributed systems. Our fault-tolerant techniques make use of the primary-backup scheme to tolerate permanent hardware failures. The first technique (referred to as Tercos) terminates the execution of active-backup copies, when corresponding primary copies are successfully completed. Tercos is designed to reduce scheduling lengths in fault-free scenarios to enhance schedulability by virtue of executing portions of active-backup copies in passive forms. The second technique (referred to as Debus) uses a deferred-active-backup scheme to further minimize schedule lengths to improve the schedulability performance. Debus schedules active-backup copies as late as possible, while terminating active-backup copies when their primary copies are completed. Experimental results show that, compared with existing algorithms in literature, Tercos can significantly improve schedulability by up to 17.0% (with an average of 9.7%). Furthermore, empirical results reveal that Debus can enhance schedulability over Tercos by up to 12% (with an average of 7.8%).