Swarm intelligence
SIGGRAPH '88 Proceedings of the 15th annual conference on Computer graphics and interactive techniques
Multi-Objective Optimization Using Evolutionary Algorithms
Multi-Objective Optimization Using Evolutionary Algorithms
Artificial Intelligence: A Modern Approach
Artificial Intelligence: A Modern Approach
Design and Evaluation of a Resource Selection Framework for Grid Applications
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Policy Driven Heterogeneous Resource Co-Allocation with Gangmatching
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Optimal Scheduling for Fault-Tolerant and Firm Real-Time Systems
RTCSA '98 Proceedings of the 5th International Conference on Real-Time Computing Systems and Applications
Fault-Tolerant Scheduling in Distributed Real-Time Systems
ICCNMC '01 Proceedings of the 2001 International Conference on Computer Networks and Mobile Computing (ICCNMC'01)
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Resource Management for Rapid Application Turnaround on Enterprise Desktop Grids
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Realistic Modeling and Svnthesis of Resources for Computational Grids
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Journal of Parallel and Distributed Computing
Fault-tolerant grid services using primary-backup: feasibility and performance
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
IEEE Transactions on Visualization and Computer Graphics
A Systematic Approach for Application Migration in a Grid Computing Environment
APSCC '06 Proceedings of the 2006 IEEE Asia-Pacific Conference on Services Computing
Ridge: combining reliability and performance in open grid platforms
Proceedings of the 16th international symposium on High performance distributed computing
A provisioning model and its comparison with best-effort for performance-cost optimization in grids
Proceedings of the 16th international symposium on High performance distributed computing
Adaptive Reputation-Based Scheduling on Unreliable Distributed Infrastructures
IEEE Transactions on Parallel and Distributed Systems
Exploring event correlation for failure prediction in coalitions of clusters
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Scientific Programming - Scientific Workflows
An Adaptive Middleware for Supporting Time-Critical Event Response
ICAC '08 Proceedings of the 2008 International Conference on Autonomic Computing
A resource allocation approach for supporting time-critical applications in grid environments
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A Bayesian model for predicting reliability of software systems at the architectural level
QoSA'07 Proceedings of the Quality of software architectures 3rd international conference on Software architectures, components, and applications
Real-time multimodal medical image processing: a dynamic volume-rendering application
IEEE Transactions on Information Technology in Biomedicine
Interactive Particle Swarm: A Pareto-Adaptive Metaheuristic to Multiobjective Optimization
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Hi-index | 0.01 |
In this paper, we consider the problem of supporting fault tolerance for adaptive and time-critical applications in heterogeneous and unreliable grid computing environments. Our goal for this class of applications is to optimize a user-specified benefit function while meeting the time deadline. Our first contribution in this paper is a multi-objective optimization algorithm for scheduling the application onto the most efficient and reliable resources. In this way, the processing can achieve the maximum benefit while also maximizing the success-rate, which is the probability of finishing execution without failures. However, for the cases where failures do occur, we have developed a hybrid failure-recovery scheme to ensure that the application can complete within the pre-specified time interval. Our experimental results show that our scheduling algorithm can achieve better benefit when compared to several heuristics-based greedy scheduling algorithms, while still having a negligible overhead. Benefit is further improved when we apply the hybrid failure recovery scheme, and the success-rate becomes 100%.