Extending clusters to Amazon EC2 using the Rocks toolkit

  • Authors:
  • Philip M Papadopoulos

  • Affiliations:
  • University of California and San Diego SupercomputerCenter, USA

  • Venue:
  • International Journal of High Performance Computing Applications
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In 2006, Amazon introduced their elastic computing cloud (EC2) where customers could rent, by the hour, Xen-based virtual machines hosted in Amazonâ聙聶s data center. In this so-called infrastructure as a service (IAAS) cloud, users have full root-level access to virtual machines so that they can fully customize and optionally publish machine images. The generally accepted approach to provisioning within Amazon is to first start with a base image already residing within the cloud. Then, the user customizes this base configuration to match their specific requirements. While this works well for very simple, standalone software configurations, this approach has users starting with a black box of software (the base image) and then adding/modifying this system using techniques that range from very rigorous (excellent system configuration techniques) to completely ad hoc methods. A quick survey of public machine images available in Amazonâ聙聶s cloud shows a growth of 28% from September 2010 (â聢录5500 images) to December 2010 (â聢录7050 images). The sheer number of public images makes the selection of the base configuration all that more daunting for the non-expert system administrator. In 2004, we introduced Rocks cluster toolkit rolls as pluggable, programmatic components to extend the definition of a Beowulf-style cluster. In contrast to the black box characteristics of most cloud images, we describe how the Rocks EC2 roll automatically handles the specific administrative changes needed to make any Rocks-defined computing appliance bootable within the EC2 infrastructure. When coupled with the Condor roll, it becomes straightforward to build an extended cluster as a single Condor pool with local job submission. This extended pool consists of both the userâ聙聶s local cluster infrastructure (head node and local worker nodes) and EC2 nodes. Because the EC2 nodes are configured identically to their local counterparts, users need not modify their job submission scripts, executable paths or other parameters of their jobs simply to take advantage of cloud resources. These extensions are included in the EC2 and Condor rolls and enable the systematic reuse of software that has already been developed for clusters. These systems can function in the cloud as either standalone entities or as integrated components of a userâ聙聶s existing cluster. With this work we demonstrate that cloud computing does not require entirely new approaches to systems definition and use.