Multi-state grid resource availability characterization

Authors:
Brent Rood;Michael J. Lewis
Affiliations:
Department. of Computer Science, State University of New York (SUNY) at Binghamton, NY, 13902, USA;Department. of Computer Science, State University of New York (SUNY) at Binghamton, NY, 13902, USA
Venue:
GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
Year:
2007

Citing 16
Cited 12

The interaction of parallel and sequential workloads on a network of workstations

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The utility of exploiting idle workstations for parallel computation

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The network weather service: a distributed resource performance forecasting service for metacomputing

Future Generation Computer Systems - Special issue on metacomputing
Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The core Legion object model

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Predicting Rare Events In Temporal Domains

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Condor-G: A Computation Management Agent for Multi-Institutional Grids

HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
Critical event prediction for proactive management in large-scale computer clusters

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
PlanetLab: an overlay testbed for broad-coverage services

ACM SIGCOMM Computer Communication Review
BOINC: A System for Public-Resource Computing and Storage

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
A survey of peer-to-peer content distribution technologies

ACM Computing Surveys (CSUR)
Predicting node availability in peer-to-peer networks

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The Computational and Storage Potential of Volunteer Computing

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Empirical Studies on the Behavior of Resource Availability in Fine-Grained Cycle Sharing Systems

ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Improving distributed system performance using machine availability prediction

ACM SIGMETRICS Performance Evaluation Review
Exploiting availability prediction in distributed systems

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3

Scheduling on the Grid via multi-state resource availability prediction

GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
FALCON: a system for reliable checkpoint recovery in shared grid environments

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Availability Prediction Based Replication Strategies for Grid Environments

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Dynamic scheduling for heterogeneous Desktop Grids

Journal of Parallel and Distributed Computing
Modeling resubmission in unreliable grids: the bottom-up approach

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Job-scheduling via resource availability prediction for volunteer computational grids

International Journal of Grid and Utility Computing
WiGriMMA: A Wireless Grid Monitoring Model Using Agents

Journal of Grid Computing
SpeQuloS: a QoS service for BoT applications using best effort distributed computing infrastructures

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Assessing Green Strategies in Peer-to-Peer Opportunistic Grids

Journal of Grid Computing
The Failure Trace Archive: Enabling the comparison of failure measurements and models of distributed systems

Journal of Parallel and Distributed Computing
SpeQuloS: a QoS service for hybrid and elastic computing infrastructures

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The functional heterogeneity of non-dedicated computational grids will increase with the inclusion of resources from desktop grids, P2P systems, and even mobile grids. Machine failure characteristics, as well as individual and organizational policies for resource usage by the grid, will increasingly vary even more than they already do. Since grid applications also vary as to how well they tolerate the failure of the host on which they run, grid schedulers must begin to predict and consider how resources will transition between availability modes. Toward this goal, this paper introduces five availability states, and characterizes a Condor pool trace that uncovers when, how, and why its resources reside in, and transition between, these states. This characterization suggests resource categories that schedulers can use to make better mapping decisions. Simulations that characterize how a variety of jobs would run on the traced resources demonstrate this approach’s potential for performance improvement. A simple predictor based on the previous day’s behavior indicates that the states and categories are somewhat predictable, thereby supporting the potential usefulness of multi-state grid resource availability characterization.