The grid: blueprint for a new computing infrastructure
The grid: blueprint for a new computing infrastructure
Benchmarking and comparison of the task graph scheduling algorithms
Journal of Parallel and Distributed Computing
Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing
IEEE Transactions on Parallel and Distributed Systems
Grain Size Determination for Parallel Processing
IEEE Software
Toward a Framework for Preparing and Executing Adaptive Grid Programs
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
GridWorkflow: A Flexible Failure Handling Framework for the Grid
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
A taxonomy of scientific workflow systems for grid computing
ACM SIGMOD Record
A large-scale study of failures in high-performance computing systems
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Bi-objective scheduling algorithms for optimizing makespan and reliability on heterogeneous systems
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Relative Performance of Scheduling Algorithms in Grid Environments
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Reliability-Aware Resource Management for Computational Grid/Cluster Environments
GRID '05 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing
Scheduling strategies for mapping application workflows onto the grid
HPDC '05 Proceedings of the High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium
Fault Tolerance and Recovery of Scientific Workflows on Computational Grids
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
Performability modeling for scheduling and fault tolerance strategies for scientific workflows
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
On the dynamic resource availability in grids
GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
Modeling machine availability in enterprise and wide-area distributed computing environments
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Scheduling scientific workflows to meet soft deadlines in the absence of failure models
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Architecture-based fault tolerance support for grid applications
Proceedings of the joint ACM SIGSOFT conference -- QoSA and ACM SIGSOFT symposium -- ISARCS on Quality of software architectures -- QoSA and architecting critical systems -- ISARCS
An effective job replication technique based on reliability and performance in mobile grids
GPC'10 Proceedings of the 5th international conference on Advances in Grid and Pervasive Computing
Proceedings of the 16th International ACM Sigsoft symposium on Component-based software engineering
Hi-index | 0.00 |
Complex scientific workflows are now Increasingly executed on computational grids. In addition to the challenges of managing and scheduling these workflows, reliability challenges arise because of the unreliable nature of large-scale grid infrastructure. Fault tolerance mechanisms like over-provisioning and checkpoint-recovery are used in current grid application management systems to address these reliability challenges. In this work, we propose new approaches that combine these fault tolerance techniques with existing workflow scheduling algorithms. We present a study on the effectiveness of the combined approaches by analyzing their impact on the reliability of workflow execution, workflow performance and resource usage under different reliability models, failure prediction accuracies and workflow application types.