Task Allocation for Maximizing Reliability of Distributed Computer Systems
IEEE Transactions on Computers
Static and dynamic processor scheduling disciplines in heterogeneous parallel architectures
Journal of Parallel and Distributed Computing
Future Generation Computer Systems - Special issue on metacomputing
Identifying Dynamic Replication Strategies for a High-Performance Data Grid
GRID '01 Proceedings of the Second International Workshop on Grid Computing
Improving Performance via Computational Replication on a Large-Scale Computational Grid
CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Heuristics for Scheduling Parameter Sweep Applications in Grid Environments
HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
Data Replication Strategies in Grid Environments
ICA3PP '02 Proceedings of the Fifth International Conference on Algorithms and Architectures for Parallel Processing
Predicting node availability in peer-to-peer networks
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Empirical Studies on the Behavior of Resource Availability in Fine-Grained Cycle Sharing Systems
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Efficient task replication and management for adaptive fault tolerance in mobile Grid environments
Future Generation Computer Systems - Special section: Information engineering and enterprise architecture in distributed computing environments
Failure Prediction in Computational Grids
ANSS '07 Proceedings of the 40th Annual Simulation Symposium
Exploiting availability prediction in distributed systems
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Performability modeling for scheduling and fault tolerance strategies for scientific workflows
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Resource Availability Prediction for Improved Grid Scheduling
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Multi-state grid resource availability characterization
GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
Scheduling on the Grid via multi-state resource availability prediction
GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
Exploiting replication and data reuse to efficiently schedule data-intensive applications on grids
JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Modeling machine availability in enterprise and wide-area distributed computing environments
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Fault-Tolerant scheduling for bag-of-tasks grid applications
EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
Resource utilization prediction: a proposal for information technology research
Proceedings of the 1st Annual conference on Research in information technology
Hi-index | 0.00 |
Volunteer-based grid computing resources are characteristically volatile and frequently become unavailable due to the autonomy that owners maintain over them. This resource volatility has significant influence on the applications the resources host. Availability predictors can forecast unavailability, and can provide schedulers with information about reliability, which helps them make better scheduling decisions when combined with information about speed and load. This paper studies using this prediction information for deciding when to replicate jobs. In particular, our predictors forecast the probability that a job will complete uninterrupted, and our schedulers replicate those jobs that are least likely to do so. Our strategies outperform other comparable replication strategies, as measured by improved make span and fewer redundant operations. We define a new ``replication efficiency" metric, and demonstrate that our availability predictor can provide information that allows our schedulers to be more efficient than the most closely related replication strategy for a variety of loads in a trace-based grid simulation. We demonstrate that under low load conditions, our techniques come within 6% of the makespan improvement of a previously proposed replication technique while creating 76.8% fewer replicas and under higher loads, can improve makespan marginally while creating 72.5% fewer replicas.