Availability Prediction Based Replication Strategies for Grid Environments

Authors:
Brent Rood;Michael J. Lewis
Affiliations:
-;-
Venue:
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Year:
2010

Citing 19
Cited 1

Task Allocation for Maximizing Reliability of Distributed Computer Systems

IEEE Transactions on Computers
Static and dynamic processor scheduling disciplines in heterogeneous parallel architectures

Journal of Parallel and Distributed Computing
The network weather service: a distributed resource performance forecasting service for metacomputing

Future Generation Computer Systems - Special issue on metacomputing
Identifying Dynamic Replication Strategies for a High-Performance Data Grid

GRID '01 Proceedings of the Second International Workshop on Grid Computing
Improving Performance via Computational Replication on a Large-Scale Computational Grid

CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Heuristics for Scheduling Parameter Sweep Applications in Grid Environments

HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
Data Replication Strategies in Grid Environments

ICA3PP '02 Proceedings of the Fifth International Conference on Algorithms and Architectures for Parallel Processing
Predicting node availability in peer-to-peer networks

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Empirical Studies on the Behavior of Resource Availability in Fine-Grained Cycle Sharing Systems

ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Efficient task replication and management for adaptive fault tolerance in mobile Grid environments

Future Generation Computer Systems - Special section: Information engineering and enterprise architecture in distributed computing environments
Failure Prediction in Computational Grids

ANSS '07 Proceedings of the 40th Annual Simulation Symposium
Exploiting availability prediction in distributed systems

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Performability modeling for scheduling and fault tolerance strategies for scientific workflows

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Resource Availability Prediction for Improved Grid Scheduling

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Multi-state grid resource availability characterization

GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
Scheduling on the Grid via multi-state resource availability prediction

GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
Exploiting replication and data reuse to efficiently schedule data-intensive applications on grids

JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Modeling machine availability in enterprise and wide-area distributed computing environments

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Fault-Tolerant scheduling for bag-of-tasks grid applications

EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing

Resource utilization prediction: a proposal for information technology research

Proceedings of the 1st Annual conference on Research in information technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Volunteer-based grid computing resources are characteristically volatile and frequently become unavailable due to the autonomy that owners maintain over them. This resource volatility has significant influence on the applications the resources host. Availability predictors can forecast unavailability, and can provide schedulers with information about reliability, which helps them make better scheduling decisions when combined with information about speed and load. This paper studies using this prediction information for deciding when to replicate jobs. In particular, our predictors forecast the probability that a job will complete uninterrupted, and our schedulers replicate those jobs that are least likely to do so. Our strategies outperform other comparable replication strategies, as measured by improved make span and fewer redundant operations. We define a new ``replication efficiency" metric, and demonstrate that our availability predictor can provide information that allows our schedulers to be more efficient than the most closely related replication strategy for a variety of loads in a trace-based grid simulation. We demonstrate that under low load conditions, our techniques come within 6% of the makespan improvement of a previously proposed replication technique while creating 76.8% fewer replicas and under higher loads, can improve makespan marginally while creating 72.5% fewer replicas.