NAMD2: greater scalability for parallel molecular dynamics
Journal of Computational Physics - Special issue on computational molecular biophysics
Towards an energy complexity of computation
Information Processing Letters - Special issue in honor of Edsger W. Dijkstra
Energy-delay efficiency of VLSI computations
Proceedings of the 12th ACM Great Lakes symposium on VLSI
ET2: a metric for time and energy efficiency of computation
Power aware computing
Overview of the Blue Gene/L system architecture
IBM Journal of Research and Development
Blue Gene/L advanced diagnostics environment
IBM Journal of Research and Development
Chip multiprocessing and the cell broadband engine
Proceedings of the 3rd conference on Computing frontiers
Improving the accuracy of snoop filtering using stream registers
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
The blue gene/L supercomputer: a hardware and software story
International Journal of Parallel Programming
The cell broadband engine: exploiting multiple levels of parallelism in a chip multiprocessor
International Journal of Parallel Programming
Low-complexity policies for energy-performance tradeoff in chip-multi-processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Environment-conscious scheduling of HPC applications on distributed Cloud-oriented data centers
Journal of Parallel and Distributed Computing
Exploring the architecture of a stream register-based snoop filter
Transactions on high-performance embedded architectures and compilers III
Hi-index | 0.00 |
The BlueGene/L supercomputer has been designed with a focus on power/performance efficiency to achieve high application performance under the thermal constraints of common data centers. To achieve this goal, emphasis was put on system solutions to engineer a power-efficient system. To exploit thread level parallelism, the BlueGene/L system can scale to 64 racks with a total of 65536 computer nodes consisting of a single compute ASIC integrating all system functions with two industry-standard PowerPC microprocessor cores in a chip multiprocessor configuration. Each PowerPC processor exploits data-level parallelism with a high-performance SIMD oating point unitTo support good application scaling on such a massive system, special emphasis was put on efficient communication primitives by including five highly optimized communification networks. After an initial introduction of the Blue-Gene/L system architecture, we analyze power/performance efficiency for the BlueGene system using performance and power characteristics for the overall system performance (as exemplified by peak performance numbers.To understand application scaling behavior, and its impact on performance and power/performance efficiency, we analyze the NAMD molecular dynamics package using the ApoA1 benchmark. We find that even for strong scaling problems, BlueGene/L systems can deliver superior performance scaling and deliver significant power/performance efficiency. Application benchmark power/performance scaling for the voltage-invariant energy delay 2 power/performance metric demonstrates that choosing a power-efficient 700MHz embedded PowerPC processor core and relying on application parallelism was the right decision to build a powerful, and power/performance efficient system