Configuring Large High-Performance Clusters at Lightspeed: A Case Study

  • Authors:
  • Philip M. Papadopoulos;Caroline A. Papadopoulos;Mason J. Katz;William J. Link;Greg Bruno

  • Affiliations:
  • The San Diego Supercomputer Center, University of California, San Diego La Jolla, CA 92093-0505;Physical Oceanography Research Division, Scripps Institution of Oceanography, University of California, San Diego La Jolla, CA 92093-0505;The San Diego Supercomputer Center, University of California, San Diego La Jolla, CA 92093-0505;The San Diego Supercomputer Center, University of California, San Diego La Jolla, CA 92093-0505;The San Diego Supercomputer Center, University of California, San Diego La Jolla, CA 92093-0505

  • Venue:
  • International Journal of High Performance Computing Applications
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Over a decade ago, the TOP500 list was started as a way to measure supercomputers by their sustained performance on a particular linear algebra benchmark. Once reserved for the exotic machines and extremely well-funded centers and laboratories, commodity clusters now make it possible for smaller groups to deploy and use high performance machines in their own laboratories. This paper describes a weekend activity where two existing 128-node commodity clusters were fused into a single 256-node cluster for the specific purpose of running the benchmark used to rank the machines in the TOP500 supercomputer list. The resulting metacluster sits on the November 2002 list at position 233. A key differentiator for this cluster is that it was assembled, in terms of its software, from the NPACI Rocks open-source cluster toolkit as downloaded from the public website. The toolkit allows non-cluster experts to deploy and run supercomputer-class machines in a matter of hours instead of weeks or months. With the exception of recompiling the University of Tennessee's Automatically Tuned Linear Algebra Subroutines (ATLAS) library with a recommended version of the GNU C compiler, this metacluster ran a "stock" Rocks distribution. Successful first-time deployment of the fused cluster was completed in a scant 6 h. Partitioning of the metacluster and restoration of the two 128-node clusters to their original configuration was completed in just over 40 min. This paper describes early (pre-weekend) benchmark activities to empirically determine reasonably good parameters for the High Performance Linpack (HPL) code on both Ethernet and Myrinet interconnects. It fully describes the physical layout of the machine, the description-based installation methods used in Rocks to re-deploy two independent clusters as a single cluster, and gives the benchmark results that were gathered over the 40-h period allotted for the complete experiment. In addition, we describe some of the on-line monitoring and measurement techniques that were employed during the experiment. Finally, we point out the issues uncovered with a commodity cluster of this size. The techniques presented in this paper truly bring supercomputers into the hands of the masses of computational scientists.