The interaction of parallel and sequential workloads on a network of workstations
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The utility of exploiting idle workstations for parallel computation
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Future Generation Computer Systems - Special issue on metacomputing
Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs
Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Predicting Rare Events In Temporal Domains
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Condor-G: A Computation Management Agent for Multi-Institutional Grids
HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
Critical event prediction for proactive management in large-scale computer clusters
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
PlanetLab: an overlay testbed for broad-coverage services
ACM SIGCOMM Computer Communication Review
BOINC: A System for Public-Resource Computing and Storage
GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
A survey of peer-to-peer content distribution technologies
ACM Computing Surveys (CSUR)
Predicting node availability in peer-to-peer networks
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The Computational and Storage Potential of Volunteer Computing
CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Empirical Studies on the Behavior of Resource Availability in Fine-Grained Cycle Sharing Systems
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Improving distributed system performance using machine availability prediction
ACM SIGMETRICS Performance Evaluation Review
Exploiting availability prediction in distributed systems
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Scheduling on the Grid via multi-state resource availability prediction
GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
FALCON: a system for reliable checkpoint recovery in shared grid environments
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Availability Prediction Based Replication Strategies for Grid Environments
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Dynamic scheduling for heterogeneous Desktop Grids
Journal of Parallel and Distributed Computing
Modeling resubmission in unreliable grids: the bottom-up approach
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Job-scheduling via resource availability prediction for volunteer computational grids
International Journal of Grid and Utility Computing
WiGriMMA: A Wireless Grid Monitoring Model Using Agents
Journal of Grid Computing
SpeQuloS: a QoS service for BoT applications using best effort distributed computing infrastructures
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Assessing Green Strategies in Peer-to-Peer Opportunistic Grids
Journal of Grid Computing
Journal of Parallel and Distributed Computing
SpeQuloS: a QoS service for hybrid and elastic computing infrastructures
Cluster Computing
Hi-index | 0.00 |
The functional heterogeneity of non-dedicated computational grids will increase with the inclusion of resources from desktop grids, P2P systems, and even mobile grids. Machine failure characteristics, as well as individual and organizational policies for resource usage by the grid, will increasingly vary even more than they already do. Since grid applications also vary as to how well they tolerate the failure of the host on which they run, grid schedulers must begin to predict and consider how resources will transition between availability modes. Toward this goal, this paper introduces five availability states, and characterizes a Condor pool trace that uncovers when, how, and why its resources reside in, and transition between, these states. This characterization suggests resource categories that schedulers can use to make better mapping decisions. Simulations that characterize how a variety of jobs would run on the traced resources demonstrate this approach’s potential for performance improvement. A simple predictor based on the previous day’s behavior indicates that the states and categories are somewhat predictable, thereby supporting the potential usefulness of multi-state grid resource availability characterization.