Boosting adaptivity of fault-tolerant scheduling for real-time tasks with service requirements on clusters

  • Authors:
  • Xiaomin Zhu;Chuan He;Rong Ge;Peizhong Lu

  • Affiliations:
  • Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410073, PR China;Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410073, PR China;Department of Mathematics, Statistics, and Computer Science, Marquette University, Milwaukee, WI 53233, USA;School of Computer Science, Fudan University, Shanghai 200433, PR China

  • Venue:
  • Journal of Systems and Software
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: Thank to the excellent extensibility and usability, computer clusters have become the dominating platform for parallel computing. Fault-tolerance is mandatory for safety-critical applications running on clusters. In this paper we propose a service-aware and adaptive fault-tolerant scheduling algorithm using overlapping technologies (SAO in short) that can tolerate a node's permanent failure at any time instant for real-time tasks with service requirements in heterogeneous clusters. SAO adopts the primary/backup model and considers the timing constraints, service requirements, and system resource utilization. To improve system resource utilization, we employ backup-backup (BB in short) and primary-backup (PB in short) overlapping technologies and analyze the overlapping constraints. In addition, SAO has high system adaptivity by dynamically adjusting the service levels of tasks based on system load. Furthermore, to improve resource utilization and schedulability, SAO makes backup copies adopt passive execution scheme or decrease the overlapping execution time of the primary copy and backup copy of a task as much as possible. Compared with a baseline algorithm SAWO (a service-aware and adaptive fault-tolerant scheduling algorithm without using overlapping technologies) and an existing algorithm DYFARS with simulation experiments, SAO achieves an average of 51.25% improvement in performability.