Estimating deadline-miss probabilities of tasks in large distributed systems

Authors:
Dongping Wang;Bin Gong;Guoling Zhao
Affiliations:
Department of Computer Science and Technology, ShanDong University, Jinan, China;Department of Computer Science and Technology, ShanDong University, Jinan, China;Shandong College of Electronic Technology, Jinan, China
Venue:
GPC'12 Proceedings of the 7th international conference on Advances in Grid and Pervasive Computing
Year:
2012

Citing 14
Cited 0

SETI@home: an experiment in public-resource computing

Communications of the ACM
Is remote host availability governed by a universal law?

ACM SIGMETRICS Performance Evaluation Review
BOINC: A System for Public-Resource Computing and Storage

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
Automatic methods for predicting machine availability in desktop Grid and peer-to-peer systems

CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
The Computational and Storage Potential of Volunteer Computing

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Exploiting availability prediction in distributed systems

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Ensuring Collective Availability in Volatile Resource Pools Via Forecasting

DSOM '08 Proceedings of the 19th IFIP/IEEE international workshop on Distributed Systems: Operations and Management: Managing Large-Scale Service Deployment
On correlated availability in Internet-distributed systems

GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
On the Scheduling of Checkpoints in Desktop Grids

CCGRID '11 Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Towards Real-Time, Volunteer Distributed Computing

CCGRID '11 Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Long-term availability prediction for groups of volunteer resources

Journal of Parallel and Distributed Computing
Modeling machine availability in enterprise and wide-area distributed computing environments

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the past decade, large distributed systems with unreliable hosts including P2P systems and volunteer computing systems have become common. The volatility nature of resources makes it a challenge to schedule tasks with soft deadline in such systems. In this paper we examine one of the critical problems, estimating deadline-miss probabilities of tasks running on unreliable hosts. Through analysis of trace data gathered from an actual volunteer computing system, we get a general property about host's period available fraction, based on which we propose an efficient method of estimating deadline-miss probability. To evaluate the accuracy of this method, we conduct trace-driven simulations whose results show that average absolute difference between estimated probability and real ratio is smaller than 2%. To compare our method with two other methods, we simulate a scheduler which distributes task based on estimated probability. Results show that our method performs better.