Automatic methods for predicting machine availability in desktop Grid and peer-to-peer systems

Authors:
J. Brevik;D. Nurmi;R. Wolski
Affiliations:
Dept. of Math. & Comput. Sci., Wheaton Coll., Norton, MA, USA;La Trobe Univ., Bundoora, Vic., Australia;Sch. of Telecommun. Eng., Valladolid Univ., Spain
Venue:
CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
Year:
2004

Citing 0
Cited 21

Using queue structures to improve job reliability

Proceedings of the 16th international symposium on High performance distributed computing
Ridge: combining reliability and performance in open grid platforms

Proceedings of the 16th international symposium on High performance distributed computing
Strategies to create platforms for differentiated services from dedicated and opportunistic resources

Journal of Parallel and Distributed Computing
Characterizing resource availability for volunteer computing and its impact on task distribution methods

SEPADS'07 Proceedings of the 6th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems
Grid workflow scheduling based on reliability cost

Proceedings of the 2nd international conference on Scalable information systems
Dynamic Grid Scheduling Using Job Runtime Requirements and Variable Resource Availability

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
An analysis of clustered failures on large supercomputing systems

Journal of Parallel and Distributed Computing
Trace-based evaluation of job runtime and queue wait time predictions in grids

Proceedings of the 18th ACM international symposium on High performance distributed computing
Scheduling on the Grid via multi-state resource availability prediction

GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
Database replication in large scale systems: optimizing the number of replicas

Proceedings of the 2009 EDBT/ICDT Workshops
Pro-active failure handling mechanisms for scheduling in grid computing environments

Journal of Parallel and Distributed Computing
Current research and practice in proactive fault management

International Journal of Computers and Applications
Decentralized Resource Availability Prediction for a Desktop Grid

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Predicting the Quality of Service of a Peer-to-Peer Desktop Grid

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Lifetime-based dynamic data replication in P2P systems

Globe'11 Proceedings of the 4th international conference on Data management in grid and peer-to-peer systems
Flexible resource allocation for reliable virtual cluster computing systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Proactive process-level live migration and back migration in HPC environments

Journal of Parallel and Distributed Computing
Decentralized approach to resource availability prediction using group availability in a P2P desktop grid

Future Generation Computer Systems
Estimating deadline-miss probabilities of tasks in large distributed systems

GPC'12 Proceedings of the 7th international conference on Advances in Grid and Pervasive Computing
Dependable Grid Workflow Scheduling Based on Resource Availability

Journal of Grid Computing
State-based predictions with self-correction on Enterprise Desktop Grid environments

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we examine the problem of predicting machine availability in desktop and enterprise computing environments. Predicting the duration that a machine will run until it restarts (availability duration) is critically useful to application scheduling and resource characterization in federated systems. We describe one parametric model fitting technique and two nonparametric prediction techniques, comparing their accuracy in predicting the quantiles of empirically observed machine availability distributions. We describe each method analytically and evaluate its precision using a synthetic trace of machine availability constructed from a known distribution. To detail their practical efficacy, we apply them to machine availability traces from three separate desktop and enterprise computing environments, and evaluate each method in terms of the accuracy with which it predicts availability in a trace driven simulation. Our results indicate that availability duration can be predicted with quantifiable confidence bounds and that these bounds can he used as conservative bounds on lifetime predictions. Moreover a nonparametric method based on a binomial approach generates the most accurate estimates.