Efficient algorithms for distributed snapshots and global virtual time approximation
Journal of Parallel and Distributed Computing - Special issue on parallel and discrete event simulation
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Fundamentals of fault-tolerant distributed computing in asynchronous environments
ACM Computing Surveys (CSUR)
GridWorkflow: A Flexible Failure Handling Framework for the Grid
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
A Novel Architecture for Realizing Grid Workflow using Tuple Spaces
GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
ASKALON: a tool set for cluster and Grid computing: Research Articles
Concurrency and Computation: Practice & Experience - Grid Performance
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A resource management and fault tolerance services in grid computing
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part II
Scientific workflow management and the Kepler system: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Intelligent Selection of Fault Tolerance Techniques on the Grid
E-SCIENCE '07 Proceedings of the Third IEEE International Conference on e-Science and Grid Computing
Adaptive checkpointing strategy to tolerate faults in economy based grid
The Journal of Supercomputing
Fault-Tolerant scheduling for bag-of-tasks grid applications
EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
Performance evaluation of fault tolerance techniques in grid computing system
Computers and Electrical Engineering
International Journal of Security and Networks
The Journal of Supercomputing
The Journal of Supercomputing
Hi-index | 0.00 |
In order to achieve high level of reliability and availability, the grid infrastructure should be a foolproof fault tolerant. Fault tolerance plays a key role in order to assert availability and reliability of a grid system. Since the failure of resources affects job execution fatally, fault tolerance service is essential to satisfy QoS requirement in grid computing.In this paper we proposed two hybrid fault tolerance techniques (FTTs) that are called alternate task with checkpoint and alternate task with retry. These proposed hybrid FTTs inherit the good features and overcome the limitations of workflow level FTT and task level FTT. We evaluate the performance of our proposed FTTs under different experimental environments. Finally, we conclude that alternate task with checkpoint improves the reliability of a grid system more significantly than alternate task with retry.