Discovering Statistical Models of Availability in Large Distributed Systems: An Empirical Study of SETI@home

Authors:
Bahman Javadi;Derrick Kondo;Jean-Marc Vincent;David P. Anderson
Affiliations:
University of Melbourne, Melbourne;INRIA, Monbonnot Saint Martin;University of Joseph Fourier, Grenoble;U.C. Berkeley Space Sciences Laboratory, Berkeley
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2011

Citing 0
Cited 8

Multi-scale analysis of large distributed computing systems

Proceedings of the third international workshop on Large-scale system and application performance
The price of forgetting in parallel and non-observable queues

Performance Evaluation
Long-term availability prediction for groups of volunteer resources

Journal of Parallel and Distributed Computing
Failure-aware resource provisioning for hybrid Cloud infrastructure

Journal of Parallel and Distributed Computing
A User-Based Model of Grid Computing Workloads

GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
On the checkpointing strategy in desktop grids

IDCS'12 Proceedings of the 5th international conference on Internet and Distributed Computing Systems
MatchTree: Flexible, scalable, and fault-tolerant wide-area resource discovery with distributed matchmaking and aggregation

Future Generation Computer Systems
The Failure Trace Archive: Enabling the comparison of failure measurements and models of distributed systems

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the age of cloud, Grid, P2P, and volunteer distributed computing, large-scale systems with tens of thousands of unreliable hosts are increasingly common. Invariably, these systems are composed of heterogeneous hosts whose individual availability often exhibit different statistical properties (for example stationary versus nonstationary behavior) and fit different models (for example exponential, Weibull, or Pareto probability distributions). In this paper, we describe an effective method for discovering subsets of hosts whose availability have similar statistical properties and can be modeled with similar probability distributions. We apply this method with about 230,000 host availability traces obtained from a real Internet-distributed system, namely SETI@home. We find that about 21 percent of hosts exhibit availability, that is, a truly random process, and that these hosts can often be modeled accurately with a few distinct distributions from different families. We show that our models are useful and accurate in the context of a scheduling problem that deals with resource brokering. We believe that these methods and models are critical for the design of stochastic scheduling algorithms across large systems where host availability is uncertain.