Fault-tolerant scheduling for differentiated classes of tasks with low replication cost in computational grids

  • Authors:
  • Qin Zheng;Bharadwaj Veeravalli;Chen-Khong Tham

  • Affiliations:
  • National University of Singapore;National University of Singapore;National University of Singapore

  • Venue:
  • Proceedings of the 16th international symposium on High performance distributed computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Fault-tolerant scheduling is an imperative step for large-scale computational Grid systems, as often geographically distributed nodes co-operate to execute a task. By and large, the primary-backup approach is a common methodology used for fault tolerance where in each task has a primary copy and a backup copy on two different processors. Backup overloading has been proposed to reduce replication cost by allowing the backup copy to overload with other backup copies on the same processor. In this paper, we consider two classes of independent tasks where in both the classes have fault-tolerance requirements. Furthermore, Class 1 tasks require the response time to be as short as possible when a fault occurs, while Class 2 tasks prefer backups with minimum replication cost. We propose two algorithms, called the MRC-ECT algorithm and the MCT-LRC algorithm. Algorithm MRC-ECT is shown to guarantee an optimal backup schedule in terms of replication cost, while MCT-LRCcan schedule a backup with minimum completion time and low replication cost. We conduct extensive simulation experiments to quantify the performance of the proposed algorithms.