Fault-tolerant scheduling for differentiated classes of tasks with low replication cost in computational grids

Authors:
Qin Zheng;Bharadwaj Veeravalli;Chen-Khong Tham
Affiliations:
National University of Singapore;National University of Singapore;National University of Singapore
Venue:
Proceedings of the 16th international symposium on High performance distributed computing
Year:
2007

Citing 4
Cited 1

The grid: blueprint for a new computing infrastructure

The grid: blueprint for a new computing infrastructure
Deterministic Processor Scheduling

ACM Computing Surveys (CSUR)
Faults in Grids: Why are they so bad and What can be done about it?

GRID '03 Proceedings of the 4th International Workshop on Grid Computing
QoS Support for Time-Critical Grid Workflow Applications

E-SCIENCE '05 Proceedings of the First International Conference on e-Science and Grid Computing

Performance evaluation of fault tolerance techniques in grid computing system

Computers and Electrical Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fault-tolerant scheduling is an imperative step for large-scale computational Grid systems, as often geographically distributed nodes co-operate to execute a task. By and large, the primary-backup approach is a common methodology used for fault tolerance where in each task has a primary copy and a backup copy on two different processors. Backup overloading has been proposed to reduce replication cost by allowing the backup copy to overload with other backup copies on the same processor. In this paper, we consider two classes of independent tasks where in both the classes have fault-tolerance requirements. Furthermore, Class 1 tasks require the response time to be as short as possible when a fault occurs, while Class 2 tasks prefer backups with minimum replication cost. We propose two algorithms, called the MRC-ECT algorithm and the MCT-LRC algorithm. Algorithm MRC-ECT is shown to guarantee an optimal backup schedule in terms of replication cost, while MCT-LRCcan schedule a backup with minimum completion time and low replication cost. We conduct extensive simulation experiments to quantify the performance of the proposed algorithms.