Improving Performance via Computational Replication on a Large-Scale Computational Grid
CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Distributed Job Scheduling on Computational Grids Using Multiple Simultaneous Requests
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Measurement of Failure Rate in Widely Distributed Software
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Condor-G: A Computation Management Agent for Multi-Institutional Grids
HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
Faults in Grids: Why are they so bad and What can be done about it?
GRID '03 Proceedings of the 4th International Workshop on Grid Computing
NCA '04 Proceedings of the Network Computing and Applications, Third IEEE International Symposium
Cluster Computing and Grid 2005 Works in Progress
IEEE Distributed Systems Online
Automatic methods for predicting machine availability in desktop Grid and peer-to-peer systems
CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
A resource manager for optimal resource selection and fault tolerance service in Grids
CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
A Co-ordinate Based Resource Allocation Strategy for Grid Environments
CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
User group-based workload analysis and modelling
CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2 - Volume 02
A multi-dimensional scheduling scheme in a Grid computing environment
Journal of Parallel and Distributed Computing
Failure Prediction in Computational Grids
ANSS '07 Proceedings of the 40th Annual Simulation Symposium
Executing Large Parameter Sweep Applications on a Multi-VO Testbed
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Robust parallel job scheduling infrastructure for service-oriented grid computing systems
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part IV
Enhanced Dynamic Hierarchical Replication and Weighted Scheduling Strategy in Data Grid
Journal of Parallel and Distributed Computing
A job submission manager for large-scale distributed systems based on job futurity predictor
International Journal of Grid and Utility Computing
Hi-index | 0.00 |
In this paper, we consider designing pro-active failure handling strategies for grid environments. These strategies estimate the availability of resources in the Grid, and also preemptively calculate the expected long term capacity of the Grid. Using these strategies, we create modified versions of the backfill and replication algorithms to include all three pro-active strategies to ascertain each of their effectiveness in the prevention of job failures during execution. Also, we extend our earlier work on a co-ordinate based allocation strategy. The extended algorithm also shows continual improvement when operating under the same execution environment. In our experiments, we compare these enhanced algorithms to their original forms, and show that pro-active failure handling is able to, in some cases, avoid all job failures during execution. Also, we show that NSA provides the best balance of enhanced throughput and job failures during execution of the algorithms we have considered.