Exploiting availability prediction in distributed systems

Authors:
James W. Mickens;Brian D. Noble
Affiliations:
University of Michigan;University of Michigan
Venue:
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Year:
2006

Citing 0
Cited 33

Delay aware querying with seaweed

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
TFS: a transparent file system for contributory storage

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Contributing storage using the transparent file system

ACM Transactions on Storage (TOS)
Stochastic analysis of the interplay between object maintenance and churn

Computer Communications
Replication degree customization for high availability

Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
Exploring event correlation for failure prediction in coalitions of clusters

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Ensuring Collective Availability in Volatile Resource Pools Via Forecasting

DSOM '08 Proceedings of the 19th IFIP/IEEE international workshop on Distributed Systems: Operations and Management: Managing Large-Scale Service Deployment
Co-designing the failure analysis and monitoring of large-scale systems

ACM SIGMETRICS Performance Evaluation Review
Improving peer-to-peer performance through server-side scheduling

ACM Transactions on Computer Systems (TOCS)
Multi-state grid resource availability characterization

GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
SPLAY: distributed systems evaluation made simple (or how to turn ideas into live systems in a breeze)

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Failure-Aware Construction and Reconfiguration of Distributed Virtual Machines for High Availability Computing

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Privacy, cost, and availability tradeoffs in decentralized OSNs

Proceedings of the 2nd ACM workshop on Online social networks
Scheduling on the Grid via multi-state resource availability prediction

GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
Long term study of peer behavior in the KAD DHT

IEEE/ACM Transactions on Networking (TON)
Finding Good Partners in Availability-Aware P2P Networks

SSS '09 Proceedings of the 11th International Symposium on Stabilization, Safety, and Security of Distributed Systems
Failure-aware resource management for high-availability computing clusters with distributed virtual machines

Journal of Parallel and Distributed Computing
Availability Prediction Based Replication Strategies for Grid Environments

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Quantifying event correlations for proactive failure management in networked computing systems

Journal of Parallel and Distributed Computing
StrobeLight: lightweight availability mapping and anomaly detection

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
A model for space-correlated failures in large-scale distributed systems

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Availability in globally distributed storage systems

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Quality of experience in distributed databases

Distributed and Parallel Databases
Long-term availability prediction for groups of volunteer resources

Journal of Parallel and Distributed Computing
A novel data replication mechanism in P2P VoD system

Future Generation Computer Systems
P2P consistency support for large-scale interactive applications

Computer Networks: The International Journal of Computer and Telecommunications Networking
On the impact of users availability in OSNs

Proceedings of the Fifth Workshop on Social Network Systems
Choosing partners based on availability in P2P networks

ACM Transactions on Autonomous and Adaptive Systems (TAAS)
A Battery-Aware Algorithm for Supporting Collaborative Applications

Mobile Networks and Applications
Estimating deadline-miss probabilities of tasks in large distributed systems

GPC'12 Proceedings of the 7th international conference on Advances in Grid and Pervasive Computing
A decentralized approach for mining event correlations in distributed system monitoring

Journal of Parallel and Distributed Computing
Reliability and availability issues in large-scale distributed systems

Proceedings of the Winter Simulation Conference
On the interplay between data redundancy and retrieval times in P2P storage systems

Computer Networks: The International Journal of Computer and Telecommunications Networking

Quantified Score

Hi-index	0.00

Visualization

Abstract

Loosely-coupled distributed systems have significant scale and cost advantages over more traditional architectures, but the availability of the nodes in these systems varies widely. Availability modeling is crucial for predicting per-machine resource burdens and understanding emergent, system-wide phenomena. We present new techniques for predicting availability and test them using traces taken from three distributed systems. We then describe three applications of availability prediction. The first, availability-guided replica placement, reduces object copying in a distributed data store while increasing data availability. The second shows how availability prediction can improve routing in delay-tolerant networks. The third combines availability prediction with virus modeling to improve forecasts of global infection dynamics.