Power and performance optimization at the system level

  • Authors:
  • Valentina Salapura;Randy Bickford;Matthias Blumrich;Arthur A. Bright;Dong Chen;Paul Coteus;Alan Gara;Mark Giampapa;Michael Gschwind;Manish Gupta;Shawn Hall;Ruud A. Haring;Philip Heidelberger;Dirk Hoenicke;Gerard V. Kopcsay;Martin Ohmacht;Rick A. Rand;Todd Takken;Pavlos Vranas

  • Affiliations:
  • IBM T.J. Watson Research Center;IBM T.J. Watson Research Center;IBM T.J. Watson Research Center;IBM T.J. Watson Research Center;IBM T.J. Watson Research Center;IBM T.J. Watson Research Center;IBM T.J. Watson Research Center;IBM T.J. Watson Research Center;IBM T.J. Watson Research Center;IBM T.J. Watson Research Center;IBM T.J. Watson Research Center;IBM T.J. Watson Research Center;IBM T.J. Watson Research Center;IBM T.J. Watson Research Center;IBM T.J. Watson Research Center;IBM T.J. Watson Research Center;IBM T.J. Watson Research Center;IBM T.J. Watson Research Center;IBM T.J. Watson Research Center

  • Venue:
  • Proceedings of the 2nd conference on Computing frontiers
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The BlueGene/L supercomputer has been designed with a focus on power/performance efficiency to achieve high application performance under the thermal constraints of common data centers. To achieve this goal, emphasis was put on system solutions to engineer a power-efficient system. To exploit thread level parallelism, the BlueGene/L system can scale to 64 racks with a total of 65536 computer nodes consisting of a single compute ASIC integrating all system functions with two industry-standard PowerPC microprocessor cores in a chip multiprocessor configuration. Each PowerPC processor exploits data-level parallelism with a high-performance SIMD oating point unitTo support good application scaling on such a massive system, special emphasis was put on efficient communication primitives by including five highly optimized communification networks. After an initial introduction of the Blue-Gene/L system architecture, we analyze power/performance efficiency for the BlueGene system using performance and power characteristics for the overall system performance (as exemplified by peak performance numbers.To understand application scaling behavior, and its impact on performance and power/performance efficiency, we analyze the NAMD molecular dynamics package using the ApoA1 benchmark. We find that even for strong scaling problems, BlueGene/L systems can deliver superior performance scaling and deliver significant power/performance efficiency. Application benchmark power/performance scaling for the voltage-invariant energy delay 2 power/performance metric demonstrates that choosing a power-efficient 700MHz embedded PowerPC processor core and relying on application parallelism was the right decision to build a powerful, and power/performance efficient system