Workload management of cooperatively federated computing clusters

Authors:
Percival Xavier;Wentong Cai;Bu-Sung Lee
Affiliations:
School of Computer Engineering, Nanyang Technological University, Singapore 639798;School of Computer Engineering, Nanyang Technological University, Singapore 639798;School of Computer Engineering, Nanyang Technological University, Singapore 639798
Venue:
The Journal of Supercomputing
Year:
2006

Citing 11
Cited 2

A worldwide flock of Condors: load sharing among workstation clusters

Future Generation Computer Systems - Special issue: resource management in distributed systems
The Slack Method: A New Method for Static Allocation ofHard Real-Time Tasks

Real-Time Systems
Resource management in Legion

Future Generation Computer Systems - Special issue on metacomputing
Condor-G: A Computation Management Agent for Multi-Institutional Grids

Cluster Computing
Enhanced Algorithms for Multi-site Scheduling

GRID '02 Proceedings of the Third International Workshop on Grid Computing
Supporting Priorities and Improving Utilization of the IBM SP Scheduler Using Slack-Based Backfilling

IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
The EASY - LoadLeveler API Project

IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Quelling Queue Storms

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Probabilistic performance guarantee for real-time tasks with varying computation times

RTAS '95 Proceedings of the Real-Time Technology and Applications Symposium
Slack Stealing Job Admission Control Scheduling

Slack Stealing Job Admission Control Scheduling
Job Superscheduler Architecture and Performance in Computational Grid Environments

Proceedings of the 2003 ACM/IEEE conference on Supercomputing

On construction of a well-balanced allocation strategy for heterogeneous multi-cluster computing environments

The Journal of Supercomputing
An adaptive job allocation strategy for heterogeneous multi-cluster systems

GPC'10 Proceedings of the 5th international conference on Advances in Grid and Pervasive Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cooperative resource sharing enables distinct organizations to form a federation of computing resources. The motivation behind cooperation is that organizations are likely to serve each other by trading unused CPU cycles given the existence of irregular usage patterns of their local resources. In this way, resource sharing would enable organizations to purchase resources at a feasible level while meeting peak computational throughput requirements. This federation results in community grid that must be managed. A functional broker is deployed to facilitate remote resource access within the community grid. A major issue is the problem of correlations in job arrivals caused by seasonal usage and/or coincident resource usage demand patterns. These correlations incur high levels of burstiness in job arrivals causing the job queue of the broker to grow to an extent such that its performance becomes severely impaired. Since job arrivals cannot be controlled, management strategies must be employed to admit jobs in a manner that can sustain a fair level of resource allocation performance at all participating organizations in the community. In this paper, we present a theoretical analysis of the problem of job traffic burstiness on resource allocation performance in order to elicit the general job management strategies to be employed. Based on the analysis, we define and justify a job management strategies for the resource broker to cope with overload conditions caused by job arrival correlations.