Object-oriented simulation modeling with C++/CSIM17
WSC '95 Proceedings of the 27th conference on Winter simulation
A first order approximation to the optimum checkpoint interval
Communications of the ACM
The AppLeS parameter sweep template: user-level middleware for the grid
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Adaptive Computing on the Grid Using AppLeS
IEEE Transactions on Parallel and Distributed Systems
GridG: generating realistic computational grids
ACM SIGMETRICS Performance Evaluation Review
Heuristics for Scheduling Parameter Sweep Applications in Grid Environments
HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
Faults in Grids: Why are they so bad and What can be done about it?
GRID '03 Proceedings of the 4th International Workshop on Grid Computing
Fault-tolerant grid services using primary-backup: feasibility and performance
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Fault-aware scheduling for Bag-of-Tasks applications on Desktop Grids
GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
A Realistic Integrated Model of Parallel System Workloads
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Availability Prediction Based Replication Strategies for Grid Environments
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Performance evaluation of fault tolerance techniques in grid computing system
Computers and Electrical Engineering
The ShareGrid Peer-to-Peer Desktop Grid: Infrastructure, Applications, and Performance Evaluation
Journal of Grid Computing
A hybrid fault tolerance technique in grid computing system
The Journal of Supercomputing
Towards a profound analysis of bags-of-tasks in parallel systems and their performance impact
Proceedings of the 20th international symposium on High performance distributed computing
Time-constrained high-fidelity rendering on local desktop grids
EG PGV'09 Proceedings of the 9th Eurographics conference on Parallel Graphics and Visualization
Replication based fault tolerant job scheduling strategy for economy driven grid
The Journal of Supercomputing
Energy-efficient deadline scheduling for heterogeneous systems
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
In this paper we propose a fault-tolerant scheduler for Bag-of-Tasks Grid applications, called WorkQueue with Replication Fault Tolerant (WQR-FT), obtained by adding checkpointing and replication to the WorkQueue with Replication (WQR) scheduling algorithm. By using discrete-event simulation, we show that WQR-FT not only ensures the successful completion of all the tasks in a bag, but also achieves performance better than WQR and other fault-tolerant schedulers obtained by coupling WQR with replication only, or with checkpointing only.