Adaptive checkpointing strategy to tolerate faults in economy based grid

Authors:
Babar Nazir;Kalim Qureshi;Paul Manuel
Affiliations:
Department of Computer Science, COMSATS Institute of Information Technology, Abbottabad, Pakistan 22060;Department of Mathematics and Computer Science, Kuwait University, Safat, State of Kuwait 13060;Department of Information Science, Kuwait University, Safat, State of Kuwait 13060
Venue:
The Journal of Supercomputing
Year:
2009

Citing 18
Cited 4

The grid: blueprint for a new computing infrastructure

The grid: blueprint for a new computing infrastructure
On scalable and efficient distributed failure detectors

Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
Weaving Computational Grids: How Analogous Are They with Electrical Grids?

Computing in Science and Engineering
Failure Detectors for Large-Scale Distributed Systems

SRDS '02 Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems
Faults in Grids: Why are they so bad and What can be done about it?

GRID '03 Proceedings of the 4th International Workshop on Grid Computing
The Grid 2: Blueprint for a New Computing Infrastructure

The Grid 2: Blueprint for a New Computing Infrastructure
Libra: a computational economy-based job scheduling system for clusters

Software—Practice & Experience
A taxonomy of computer-based simulations and its mapping to parallel and distributed systems simulation tools

Software—Practice & Experience
On the Optimal Placement of Secure Data Objects over Internet

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
The Anatomy of the Grid: Enabling Scalable Virtual Organizations

International Journal of High Performance Computing Applications
A dynamic job grouping-based scheduling for deploying applications with fine-grained tasks on global grids

ACSW Frontiers '05 Proceedings of the 2005 Australasian workshop on Grid computing and e-research - Volume 44
An Agent Oriented Proactive Fault-Tolerant Framework for Grid Computing

E-SCIENCE '05 Proceedings of the First International Conference on e-Science and Grid Computing
VRM: A Failure-Aware Grid Resource Management System

SBAC-PAD '05 Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing
Scheduling parameter sweep applications on global Grids: a deadline and budget constrained cost-time optimization algorithm

Software—Practice & Experience
A resource management and fault tolerance services in grid computing

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part II
Fault Tolerance in a Mobile Agent Based Computational Grid

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Exploit Failure Prediction for Adaptive Fault-Tolerance in Cluster Computing

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
A provisioning model and its comparison with best-effort for performance-cost optimization in grids

Proceedings of the 16th international symposium on High performance distributed computing

A hybrid fault tolerance technique in grid computing system

The Journal of Supercomputing
A fault-tolerant scheduling system for computational grids

Computers and Electrical Engineering
Replication based fault tolerant job scheduling strategy for economy driven grid

The Journal of Supercomputing
Performance evaluation of cloud service considering fault recovery

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we develop a fault tolerant job scheduling strategy in order to tolerate faults gracefully in an economy based grid environment. We propose a novel adaptive task checkpointing based fault tolerant job scheduling strategy for an economy based grid. The proposed strategy maintains a fault index of grid resources. It dynamically updates the fault index based on successful or unsuccessful completion of an assigned task. Whenever a grid resource broker has tasks to schedule on grid resources, it makes use of the fault index from the fault tolerant schedule manager in addition to using a time optimization heuristic. While scheduling a grid job on a grid resource, the resource broker uses fault index to apply different intensity of task checkpointing (inserting checkpoints in a task at different intervals).To simulate and evaluate the performance of the proposed strategy, this paper enhances the GridSim Toolkit-4.0 to exhibit fault tolerance related behavior. We also compare "checkpointing fault tolerant job scheduling strategy" with the well-known time optimization heuristic in an economy based grid environment. From the measured results, we conclude that even in the presence of faults, the proposed strategy effectively schedules grid jobs tolerating faults gracefully and executes more jobs successfully within the specified deadline and allotted budget. It also improves the overall execution time and minimizes the execution cost of grid jobs.