Replication based fault tolerant job scheduling strategy for economy driven grid

Authors:
Babar Nazir;Kalim Qureshi;Paul Manuel
Affiliations:
Department of Computer Science, COMSATS Institute of Information Technology, Abbottabad, Pakistan 22060;Department of Information Science, Kuwait University, Safat, Kuwait 13060;Department of Information Science, Kuwait University, Safat, Kuwait 13060
Venue:
The Journal of Supercomputing
Year:
2012

Citing 17
Cited 0

The grid: blueprint for a new computing infrastructure

The grid: blueprint for a new computing infrastructure
Fundamentals of fault-tolerant distributed computing in asynchronous environments

ACM Computing Surveys (CSUR)
A Fault Detection Service for Wide Area Distributed Computations

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
GridWorkflow: A Flexible Failure Handling Framework for the Grid

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Faults in Grids: Why are they so bad and What can be done about it?

GRID '03 Proceedings of the 4th International Workshop on Grid Computing
A taxonomy of computer-based simulations and its mapping to parallel and distributed systems simulation tools

Software—Practice & Experience
The Anatomy of the Grid: Enabling Scalable Virtual Organizations

International Journal of High Performance Computing Applications
Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
An Agent Oriented Proactive Fault-Tolerant Framework for Grid Computing

E-SCIENCE '05 Proceedings of the First International Conference on e-Science and Grid Computing
VRM: A Failure-Aware Grid Resource Management System

SBAC-PAD '05 Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing
Scheduling parameter sweep applications on global Grids: a deadline and budget constrained cost-time optimization algorithm

Software—Practice & Experience
Fault Tolerance in a Mobile Agent Based Computational Grid

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Exploit Failure Prediction for Adaptive Fault-Tolerance in Cluster Computing

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
A provisioning model and its comparison with best-effort for performance-cost optimization in grids

Proceedings of the 16th international symposium on High performance distributed computing
Intelligent Selection of Fault Tolerance Techniques on the Grid

E-SCIENCE '07 Proceedings of the Third IEEE International Conference on e-Science and Grid Computing
Adaptive checkpointing strategy to tolerate faults in economy based grid

The Journal of Supercomputing
Fault-Tolerant scheduling for bag-of-tasks grid applications

EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, the problem of fault tolerance in grid computing is addressed and a novel adaptive task replication based fault tolerant job scheduling strategy for economy driven grid is proposed. The proposed strategy maintains fault history of the resources termed as resource fault index. Fault index entry for the resource is updated based on successful completion or failure of an assigned task by the grid resource. Grid Resource Broker then replicates the task (submitting the same task to different backup resources) with different intensity, based on vulnerability of resource towards faults suggested by resource fault index. Consequently, in case of possible fault at a resource the results of replicated task(s) on other backup resource(s) can be used. Hence, user job(s) can be completed within specified deadline and assigned budget, even on the event of faults at the grid resource(s).Through extensive simulations, performance of the proposed strategy is evaluated and compared with the Time Optimization and Checkpointing based Strategy in an economy driven grid environment. The experimental results demonstrate that in the presence of faults, proposed fault tolerant strategy improves the number of tasks completed with varied deadline and fixed budget as well as number of tasks completed with varied budget and fixed deadline. Additionally, the proposed strategy used a smaller percentage of deadline time as compare to both Time Optimization and Checkpointing based Strategy. Although the proposed strategy has a percentage of budget spent greater than that of Time Optimization Strategy and Checkpointing based Strategy, it is accepted as a proposed strategy in time optimization where the main objective is to maximize tasks completed within a given deadline. It can be concluded from the experiments that the proposed strategy shows improvement in satisfying the user QoS requirements. It can effectively schedule tasks and tolerate faults gracefully even in the presence of failures, but the costs are slightly higher in terms of budget consumption. Hence, the proposed fault tolerant strategy helps in sustaining user's faith in the grid, by enabling the grid to deliver reliable and consistent performance in the presence of faults.