Scaling virtual organization clusters over a wide area network using the Kestrel workload management system

Authors:
Lance Stout;Michael Fenn;Michael A. Murphy;Sebastien Goasguen
Affiliations:
Clemson University, Clemson, SC;Clemson University, Clemson, SC;Clemson University, Clemson, SC;Clemson University, Clemson, SC
Venue:
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Year:
2010

Citing 11
Cited 1

Condor: a distributed job scheduler

Beowulf cluster computing with Linux
Virtual Distributed Environments in a Shared Infrastructure

Computer
How to measure a large open-source distributed system: Research Articles

Concurrency and Computation: Practice & Experience
Centralized versus Distributed Schedulers for Bag-of-Tasks Applications

IEEE Transactions on Parallel and Distributed Systems
Contextualization: Providing One-Click Virtual Clusters

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Virtual Organization Clusters

PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
Dynamic Provisioning of Virtual Organization Clusters

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Using XMPP for ad-hoc grid computing - an application example using parallel ant colony optimisation

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Kestrel: an XMPP-based framework for many task computing applications

Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
IP over P2P: enabling self-configuring virtual IP networks for grid computing

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A virtual network (ViNe) architecture for grid computing

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Using Kestrel and XMPP to Support the STAR Experiment in the Cloud

Journal of Grid Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the results of using the Kestrel workload management system to test operating a Virtual Organization Cluster (VOC) across multiple sites. A Many-Task Computing (MTC) framework based on the Extensible Messaging and Presence Protocol (XMPP), Kestrel presents a special purpose scheduler that can offer better VOC scalability under certain workload assumptions, namely CPU bound processes and bag-of-tasks jobs. Experimental results have shown that Kestrel is capable of operating a VOC of at least 1600 worker nodes with all nodes visible to the scheduler at once. When using multiple sites located in both North America and Europe, the latencies introduced to the round trip time of messages were on the order of 0.3 seconds. To offset the overhead of XMPP processing, a task execution time of 2 seconds is sufficient for a pool of 900 workers on a single site to operate at near 100% use. Requiring tasks that take on the order of 30 seconds to a minute to execute would compensate for increased latency during job dispatch across multiple sites.