Using queue structures to improve job reliability
Proceedings of the 16th international symposium on High performance distributed computing
Ridge: combining reliability and performance in open grid platforms
Proceedings of the 16th international symposium on High performance distributed computing
Journal of Parallel and Distributed Computing
SEPADS'07 Proceedings of the 6th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems
Grid workflow scheduling based on reliability cost
Proceedings of the 2nd international conference on Scalable information systems
Dynamic Grid Scheduling Using Job Runtime Requirements and Variable Resource Availability
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
An analysis of clustered failures on large supercomputing systems
Journal of Parallel and Distributed Computing
Trace-based evaluation of job runtime and queue wait time predictions in grids
Proceedings of the 18th ACM international symposium on High performance distributed computing
Scheduling on the Grid via multi-state resource availability prediction
GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
Database replication in large scale systems: optimizing the number of replicas
Proceedings of the 2009 EDBT/ICDT Workshops
Pro-active failure handling mechanisms for scheduling in grid computing environments
Journal of Parallel and Distributed Computing
Current research and practice in proactive fault management
International Journal of Computers and Applications
Decentralized Resource Availability Prediction for a Desktop Grid
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Predicting the Quality of Service of a Peer-to-Peer Desktop Grid
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Lifetime-based dynamic data replication in P2P systems
Globe'11 Proceedings of the 4th international conference on Data management in grid and peer-to-peer systems
Flexible resource allocation for reliable virtual cluster computing systems
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Proactive process-level live migration and back migration in HPC environments
Journal of Parallel and Distributed Computing
Future Generation Computer Systems
Estimating deadline-miss probabilities of tasks in large distributed systems
GPC'12 Proceedings of the 7th international conference on Advances in Grid and Pervasive Computing
Dependable Grid Workflow Scheduling Based on Resource Availability
Journal of Grid Computing
State-based predictions with self-correction on Enterprise Desktop Grid environments
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
In this paper we examine the problem of predicting machine availability in desktop and enterprise computing environments. Predicting the duration that a machine will run until it restarts (availability duration) is critically useful to application scheduling and resource characterization in federated systems. We describe one parametric model fitting technique and two nonparametric prediction techniques, comparing their accuracy in predicting the quantiles of empirically observed machine availability distributions. We describe each method analytically and evaluate its precision using a synthetic trace of machine availability constructed from a known distribution. To detail their practical efficacy, we apply them to machine availability traces from three separate desktop and enterprise computing environments, and evaluate each method in terms of the accuracy with which it predicts availability in a trace driven simulation. Our results indicate that availability duration can be predicted with quantifiable confidence bounds and that these bounds can he used as conservative bounds on lifetime predictions. Moreover a nonparametric method based on a binomial approach generates the most accurate estimates.