The grid: blueprint for a new computing infrastructure
The grid: blueprint for a new computing infrastructure
On scalable and efficient distributed failure detectors
Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
Weaving Computational Grids: How Analogous Are They with Electrical Grids?
Computing in Science and Engineering
Failure Detectors for Large-Scale Distributed Systems
SRDS '02 Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems
Faults in Grids: Why are they so bad and What can be done about it?
GRID '03 Proceedings of the 4th International Workshop on Grid Computing
The Grid 2: Blueprint for a New Computing Infrastructure
The Grid 2: Blueprint for a New Computing Infrastructure
Libra: a computational economy-based job scheduling system for clusters
Software—Practice & Experience
Software—Practice & Experience
On the Optimal Placement of Secure Data Objects over Internet
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
The Anatomy of the Grid: Enabling Scalable Virtual Organizations
International Journal of High Performance Computing Applications
ACSW Frontiers '05 Proceedings of the 2005 Australasian workshop on Grid computing and e-research - Volume 44
An Agent Oriented Proactive Fault-Tolerant Framework for Grid Computing
E-SCIENCE '05 Proceedings of the First International Conference on e-Science and Grid Computing
VRM: A Failure-Aware Grid Resource Management System
SBAC-PAD '05 Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing
Software—Practice & Experience
A resource management and fault tolerance services in grid computing
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part II
Fault Tolerance in a Mobile Agent Based Computational Grid
CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Exploit Failure Prediction for Adaptive Fault-Tolerance in Cluster Computing
CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
A provisioning model and its comparison with best-effort for performance-cost optimization in grids
Proceedings of the 16th international symposium on High performance distributed computing
A hybrid fault tolerance technique in grid computing system
The Journal of Supercomputing
A fault-tolerant scheduling system for computational grids
Computers and Electrical Engineering
Replication based fault tolerant job scheduling strategy for economy driven grid
The Journal of Supercomputing
Performance evaluation of cloud service considering fault recovery
The Journal of Supercomputing
Hi-index | 0.00 |
In this paper, we develop a fault tolerant job scheduling strategy in order to tolerate faults gracefully in an economy based grid environment. We propose a novel adaptive task checkpointing based fault tolerant job scheduling strategy for an economy based grid. The proposed strategy maintains a fault index of grid resources. It dynamically updates the fault index based on successful or unsuccessful completion of an assigned task. Whenever a grid resource broker has tasks to schedule on grid resources, it makes use of the fault index from the fault tolerant schedule manager in addition to using a time optimization heuristic. While scheduling a grid job on a grid resource, the resource broker uses fault index to apply different intensity of task checkpointing (inserting checkpoints in a task at different intervals).To simulate and evaluate the performance of the proposed strategy, this paper enhances the GridSim Toolkit-4.0 to exhibit fault tolerance related behavior. We also compare "checkpointing fault tolerant job scheduling strategy" with the well-known time optimization heuristic in an economy based grid environment. From the measured results, we conclude that even in the presence of faults, the proposed strategy effectively schedules grid jobs tolerating faults gracefully and executes more jobs successfully within the specified deadline and allotted budget. It also improves the overall execution time and minimizes the execution cost of grid jobs.