Scaling virtual organization clusters over a wide area network using the Kestrel workload management system

  • Authors:
  • Lance Stout;Michael Fenn;Michael A. Murphy;Sebastien Goasguen

  • Affiliations:
  • Clemson University, Clemson, SC;Clemson University, Clemson, SC;Clemson University, Clemson, SC;Clemson University, Clemson, SC

  • Venue:
  • Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the results of using the Kestrel workload management system to test operating a Virtual Organization Cluster (VOC) across multiple sites. A Many-Task Computing (MTC) framework based on the Extensible Messaging and Presence Protocol (XMPP), Kestrel presents a special purpose scheduler that can offer better VOC scalability under certain workload assumptions, namely CPU bound processes and bag-of-tasks jobs. Experimental results have shown that Kestrel is capable of operating a VOC of at least 1600 worker nodes with all nodes visible to the scheduler at once. When using multiple sites located in both North America and Europe, the latencies introduced to the round trip time of messages were on the order of 0.3 seconds. To offset the overhead of XMPP processing, a task execution time of 2 seconds is sufficient for a pool of 900 workers on a single site to operate at near 100% use. Requiring tasks that take on the order of 30 seconds to a minute to execute would compensate for increased latency during job dispatch across multiple sites.