Fundamentals of fault-tolerant distributed computing in asynchronous environments
ACM Computing Surveys (CSUR)
A Fault Detection Service for Wide Area Distributed Computations
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
GridWorkflow: A Flexible Failure Handling Framework for the Grid
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Proceedings of the 16th international symposium on High performance distributed computing
Integrating existing scientific workflow systems: the Kepler/Pegasus example
Proceedings of the 2nd workshop on Workflows in support of large-scale science
Intelligent Selection of Fault Tolerance Techniques on the Grid
E-SCIENCE '07 Proceedings of the Third IEEE International Conference on e-Science and Grid Computing
Computers and Industrial Engineering
A hybrid fault tolerance technique in grid computing system
The Journal of Supercomputing
Fault-Tolerant scheduling for bag-of-tasks grid applications
EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
A fault-tolerant scheduling system for computational grids
Computers and Electrical Engineering
A queuing network model for minimizing the total makespan of computational grids
Computers and Electrical Engineering
A new fault tolerant control approach for the three-tank system using data mining
Computers and Electrical Engineering
International Journal of Security and Networks
The Journal of Supercomputing
Hi-index | 0.00 |
As fault tolerance is the ability of a system to perform its function correctly even in the presence of faults. Therefore, different fault tolerance techniques (FTTs) are critical for improving the efficient utilization of expensive resources in high performance grid computing systems, and an important component of grid workflow management system. This paper presents a performance evaluation of most commonly used FTTs in grid computing system. In this study, we considered different system centric parameters, such as throughput, turnaround time, waiting time and network delay for the evaluation of these FTTs. For comprehensive evaluation we setup various conditions in which we vary the average percentage of faults in a system, along with different workloads in order to find out the behavior of FTTs under these conditions. The empirical evaluation shows that the workflow level alternative task techniques have performance priority on task level checkpointing techniques. This comparative study will help to grid computing researchers in order to understand the behavior and performance of different FTTs in detail.