Condor: a distributed job scheduler
Beowulf cluster computing with Linux
A Case For Grid Computing On Virtual Machines
ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
PBS Pro: Grid computing and scheduling attributes
Grid resource management
BOINC: A System for Public-Resource Computing and Storage
GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
DIRAC: A Scalable Lightweight Architecture for High Throughput Computing
GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
How to measure a large open-source distributed system: Research Articles
Concurrency and Computation: Practice & Experience
Falkon: a Fast and Light-weight tasK executiON framework
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
The cost of doing science on the cloud: the Montage example
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Contextualization: Providing One-Click Virtual Clusters
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
A break in the clouds: towards a cloud definition
ACM SIGCOMM Computer Communication Review
PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
The Eucalyptus Open-Source Cloud-Computing System
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Using XMPP for ad-hoc grid computing - an application example using parallel ant colony optimisation
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Virtual Infrastructure Management in Private and Hybrid Clouds
IEEE Internet Computing
Kestrel: an XMPP-based framework for many task computing applications
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Efficient Distribution of Virtual Machines for Cloud Computing
PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Self-provisioned hybrid clouds
Proceedings of the 7th international conference on Autonomic computing
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
IP over P2P: enabling self-configuring virtual IP networks for grid computing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A virtual network (ViNe) architecture for grid computing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A Science Driven Production Cyberinfrastructure--the Open Science Grid
Journal of Grid Computing
Journal of Grid Computing
Hi-index | 0.00 |
This paper presents the results and experiences of adapting and improving the Many-Task Computing (MTC) framework Kestrel for use with bag of tasks applications and the STAR experiment in particular. Kestrel is a lightweight, highly available job scheduling framework for Virtual Organization Clusters (VOCs) constructed in the cloud. Kestrel uses the Extensible Message and Presence Protocol (XMPP) for increasing MTC platform scalability and mitigating faults in Wide Area Network (WAN) communications. Kestrel's architecture is based upon pilot job frameworks used extensively in Grid computing, with fault-tolerant communications inspired by command-and-control botnets. The extensibility of XMPP has allowed development of protocols for identifying manager nodes, discovering the capabilities of worker agents, and for distributing tasks. Presence notifications provided by XMPP allow Kestrel to monitor the global state of the pool and to perform task dispatching based on worker availability. Since its inception, Kestrel has been modified based on its performance managing operational scientific workloads from the STAR group at Brookhaven National Laboratories. STAR provided a virtual machine image with applications for simulating proton collisions using PYTHIA and GEANT3. A Kestrel-based Virtual Organization Cluster, created on top of Clemson University's Palmetto cluster, CERN, and Amazon EC2, was able to provide over 400,000 CPU hours of computation over the course of a month using an average of 800 virtual machine instances every day, generating nearly seven terabytes of data and the largest PYTHIA production run that STAR has achieved to date.