A hybrid fault tolerance technique in grid computing system

Authors:
Kalim Qureshi;Fiaz Gul Khan;Paul Manuel;Babar Nazir
Affiliations:
Information Science Dept., Kuwait University, Kuwait City, Kuwait;COMSATS Institute of Information Technology, Abbottabad, Pakistan;Information Science Dept., Kuwait University, Kuwait City, Kuwait;COMSATS Institute of Information Technology, Abbottabad, Pakistan
Venue:
The Journal of Supercomputing
Year:
2011

Citing 12
Cited 4

Efficient algorithms for distributed snapshots and global virtual time approximation

Journal of Parallel and Distributed Computing - Special issue on parallel and discrete event simulation
Distributed snapshots: determining global states of distributed systems

ACM Transactions on Computer Systems (TOCS)
Fundamentals of fault-tolerant distributed computing in asynchronous environments

ACM Computing Surveys (CSUR)
GridWorkflow: A Flexible Failure Handling Framework for the Grid

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
A Novel Architecture for Realizing Grid Workflow using Tuple Spaces

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
ASKALON: a tool set for cluster and Grid computing: Research Articles

Concurrency and Computation: Practice & Experience - Grid Performance
Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A resource management and fault tolerance services in grid computing

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part II
Scientific workflow management and the Kepler system: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Intelligent Selection of Fault Tolerance Techniques on the Grid

E-SCIENCE '07 Proceedings of the Third IEEE International Conference on e-Science and Grid Computing
Adaptive checkpointing strategy to tolerate faults in economy based grid

The Journal of Supercomputing
Fault-Tolerant scheduling for bag-of-tasks grid applications

EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing

Performance evaluation of fault tolerance techniques in grid computing system

Computers and Electrical Engineering
Modelling and evaluating a high serviceability fault tolerance strategy in cloud computing environments

International Journal of Security and Networks
The analytic hierarchy process: task scheduling and resource allocation in cloud computing environment

The Journal of Supercomputing
Analyzing, modeling and evaluating dynamic adaptive fault tolerance strategies in cloud computing environments

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In order to achieve high level of reliability and availability, the grid infrastructure should be a foolproof fault tolerant. Fault tolerance plays a key role in order to assert availability and reliability of a grid system. Since the failure of resources affects job execution fatally, fault tolerance service is essential to satisfy QoS requirement in grid computing.In this paper we proposed two hybrid fault tolerance techniques (FTTs) that are called alternate task with checkpoint and alternate task with retry. These proposed hybrid FTTs inherit the good features and overcome the limitations of workflow level FTT and task level FTT. We evaluate the performance of our proposed FTTs under different experimental environments. Finally, we conclude that alternate task with checkpoint improves the reliability of a grid system more significantly than alternate task with retry.