Using Kestrel and XMPP to Support the STAR Experiment in the Cloud

Authors:
Lance Stout;Matthew Walker;Jérôme Lauret;Sebastien Goasguen;Michael A. Murphy
Affiliations:
&yet, LLC, Richland, USA 99352;Massachusetts Institute of Technology, Cambridge, USA 02139;Brookhaven National Laboratory, Upton, USA 11973;School of Computing, Clemson University, Clemson, USA 29634-0974;Coastal Carolina University, Conway, USA 29528
Venue:
Journal of Grid Computing
Year:
2013

Citing 26
Cited 0

Condor: a distributed job scheduler

Beowulf cluster computing with Linux
A Case For Grid Computing On Virtual Machines

ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
PBS Pro: Grid computing and scheduling attributes

Grid resource management
BOINC: A System for Public-Resource Computing and Storage

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
DIRAC: A Scalable Lightweight Architecture for High Throughput Computing

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
Virtual Distributed Environments in a Shared Infrastructure

Computer
Taverna: a tool for the composition and enactment of bioinformatics workflows

Bioinformatics
How to measure a large open-source distributed system: Research Articles

Concurrency and Computation: Practice & Experience
Falkon: a Fast and Light-weight tasK executiON framework

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
The cost of doing science on the cloud: the Montage example

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Contextualization: Providing One-Click Virtual Clusters

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
A break in the clouds: towards a cloud definition

ACM SIGCOMM Computer Communication Review
Virtual Organization Clusters

PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
The Eucalyptus Open-Source Cloud-Computing System

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Using XMPP for ad-hoc grid computing - an application example using parallel ant colony optimisation

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Virtual Infrastructure Management in Private and Hybrid Clouds

IEEE Internet Computing
Kestrel: an XMPP-based framework for many task computing applications

Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Parallel Scripting for Applications at the Petascale and Beyond

Computer
Efficient Distribution of Virtual Machines for Cloud Computing

PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Self-provisioned hybrid clouds

Proceedings of the 7th international conference on Autonomic computing
Scaling virtual organization clusters over a wide area network using the Kestrel workload management system

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
IP over P2P: enabling self-configuring virtual IP networks for grid computing

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A virtual network (ViNe) architecture for grid computing

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A Science Driven Production Cyberinfrastructure--the Open Science Grid

Journal of Grid Computing
A Distributed Workflow Management System with Case Study of Real-life Scientific Applications on Grids

Journal of Grid Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the results and experiences of adapting and improving the Many-Task Computing (MTC) framework Kestrel for use with bag of tasks applications and the STAR experiment in particular. Kestrel is a lightweight, highly available job scheduling framework for Virtual Organization Clusters (VOCs) constructed in the cloud. Kestrel uses the Extensible Message and Presence Protocol (XMPP) for increasing MTC platform scalability and mitigating faults in Wide Area Network (WAN) communications. Kestrel's architecture is based upon pilot job frameworks used extensively in Grid computing, with fault-tolerant communications inspired by command-and-control botnets. The extensibility of XMPP has allowed development of protocols for identifying manager nodes, discovering the capabilities of worker agents, and for distributing tasks. Presence notifications provided by XMPP allow Kestrel to monitor the global state of the pool and to perform task dispatching based on worker availability. Since its inception, Kestrel has been modified based on its performance managing operational scientific workloads from the STAR group at Brookhaven National Laboratories. STAR provided a virtual machine image with applications for simulating proton collisions using PYTHIA and GEANT3. A Kestrel-based Virtual Organization Cluster, created on top of Clemson University's Palmetto cluster, CERN, and Amazon EC2, was able to provide over 400,000 CPU hours of computation over the course of a month using an average of 800 virtual machine instances every day, generating nearly seven terabytes of data and the largest PYTHIA production run that STAR has achieved to date.