Performance evaluation of fault tolerance techniques in grid computing system

Authors:
Fiaz Gul Khan;Kalim Qureshi;Babar Nazir
Affiliations:
Department of Computer Science, COMSATS Institute of Information Technology, Abbottabad, Pakistan;Information Science Department Kuwait University, Kuwait;Department of Computer Science, COMSATS Institute of Information Technology, Abbottabad, Pakistan
Venue:
Computers and Electrical Engineering
Year:
2010

Citing 10
Cited 5

Fundamentals of fault-tolerant distributed computing in asynchronous environments

ACM Computing Surveys (CSUR)
A Fault Detection Service for Wide Area Distributed Computations

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
GridWorkflow: A Flexible Failure Handling Framework for the Grid

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Fault-tolerant scheduling for differentiated classes of tasks with low replication cost in computational grids

Proceedings of the 16th international symposium on High performance distributed computing
Integrating existing scientific workflow systems: the Kepler/Pegasus example

Proceedings of the 2nd workshop on Workflows in support of large-scale science
Intelligent Selection of Fault Tolerance Techniques on the Grid

E-SCIENCE '07 Proceedings of the Third IEEE International Conference on e-Science and Grid Computing
A system-centric scheduling policy for optimizing objectives of application and resource in grid computing

Computers and Industrial Engineering
A hybrid fault tolerance technique in grid computing system

The Journal of Supercomputing
Fault-Tolerant scheduling for bag-of-tasks grid applications

EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing

A fault-tolerant scheduling system for computational grids

Computers and Electrical Engineering
A queuing network model for minimizing the total makespan of computational grids

Computers and Electrical Engineering
A new fault tolerant control approach for the three-tank system using data mining

Computers and Electrical Engineering
Modelling and evaluating a high serviceability fault tolerance strategy in cloud computing environments

International Journal of Security and Networks
Analyzing, modeling and evaluating dynamic adaptive fault tolerance strategies in cloud computing environments

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

As fault tolerance is the ability of a system to perform its function correctly even in the presence of faults. Therefore, different fault tolerance techniques (FTTs) are critical for improving the efficient utilization of expensive resources in high performance grid computing systems, and an important component of grid workflow management system. This paper presents a performance evaluation of most commonly used FTTs in grid computing system. In this study, we considered different system centric parameters, such as throughput, turnaround time, waiting time and network delay for the evaluation of these FTTs. For comprehensive evaluation we setup various conditions in which we vary the average percentage of faults in a system, along with different workloads in order to find out the behavior of FTTs under these conditions. The empirical evaluation shows that the workflow level alternative task techniques have performance priority on task level checkpointing techniques. This comparative study will help to grid computing researchers in order to understand the behavior and performance of different FTTs in detail.