Early evaluation of IBM BlueGene/P

Authors:
S. Alam;R. Barrett;M. Bast;M. R. Fahey;J. Kuehn;C. McCurdy;J. Rogers;P. Roth;R. Sankaran;J. S. Vetter;P. Worley;W. Yu
Affiliations:
Oak Ridge National Laboratory, Oak Ridge, TN;Oak Ridge National Laboratory, Oak Ridge, TN;Oak Ridge National Laboratory, Oak Ridge, TN;Oak Ridge National Laboratory, Oak Ridge, TN;Oak Ridge National Laboratory, Oak Ridge, TN;Oak Ridge National Laboratory, Oak Ridge, TN;Oak Ridge National Laboratory, Oak Ridge, TN;Oak Ridge National Laboratory, Oak Ridge, TN;Oak Ridge National Laboratory, Oak Ridge, TN;Oak Ridge National Laboratory, Oak Ridge, TN;Oak Ridge National Laboratory, Oak Ridge, TN;Oak Ridge National Laboratory, Oak Ridge, TN
Venue:
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Year:
2008

Citing 13
Cited 14

s-step iterative methods for symmetric linear systems

Journal of Computational and Applied Mathematics
Parallel ocean general circulation modeling

Proceedings of the eleventh annual international conference of the Center for Nonlinear Studies on Experimental mathematics : computational issues in nonlinear science: computational issues in nonlinear science
Fast parallel algorithms for short-range molecular dynamics

Journal of Computational Physics
Low-storage, explicit Runge-Kutta schemes for the compressible Navier-Stokes equations

Applied Numerical Mathematics
OpenMP: An Industry-Standard API for Shared-Memory Programming

IEEE Computational Science & Engineering
NAMD: biomolecular simulation on thousands of processors

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
An Eulerian gyrokinetic-Maxwell solver

Journal of Computational Physics
Practical performance portability in the Parallel Ocean Program (POP): Research Articles

Concurrency and Computation: Practice & Experience - The High Performance Architectural Challenge: Mass Market versus Proprietary Components?
Performance Portability in the Physical Parameterizations of the Community Atmospheric Model

International Journal of High Performance Computing Applications
A Scalable Implementation of a Finite-Volume Dynamical Core in the Community Atmosphere Model

International Journal of High Performance Computing Applications
An Evaluation of the Oak Ridge National Laboratory Cray XT3

International Journal of High Performance Computing Applications
Cray XT4: an early evaluation for petascale scientific simulation

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
IBM System Blue Gene Solution: Blue Gene/P Application Development

IBM System Blue Gene Solution: Blue Gene/P Application Development

Impact of Quad-Core Cray XT4 System and Software Stack on Scientific Computation

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Diagnosing performance bottlenecks in emerging petascale applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
The Importance of Non-Data-Communication Overheads in MPI

International Journal of High Performance Computing Applications
Exploiting 162-Nanosecond End-to-End Communication Latency on Anton

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Experiences with a Lightweight Supercomputer Kernel: Lessons Learned from Blue Gene's CNK

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Understanding Application Performance via Micro-benchmarks on Three Large Supercomputers: Intrepid, Ranger and Jaguar

International Journal of High Performance Computing Applications
Collective algorithms for sub-communicators

Proceedings of the 26th ACM international conference on Supercomputing
On Urgency of I/O Operations

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
HiCOO: Hierarchical cooperation for scalable communication in Global Address Space programming models on Cray XT systems

Journal of Parallel and Distributed Computing
3-Dimensional root cause diagnosis via co-analysis

Proceedings of the 9th international conference on Autonomic computing
Supercomputing with commodity CPUs: are mobile SoCs ready for HPC?

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Data decomposition of Monte Carlo particle transport simulations via tally servers

Journal of Computational Physics
Scalable model of parallel computations for applications with intensive input-output

Journal of Computer and Systems Sciences International
The Experience in Designing and Evaluating the High Performance Cluster Netuno

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

BlueGene/P (BG/P) is the second generation BlueGene architecture from IBM, succeeding BlueGene/L (BG/L). BG/P is a system-on-a-chip (SoC) design that uses four PowerPC 450 cores operating at 850 MHz with a double precision, dual pipe floating point unit per core. These chips are connected with multiple interconnection networks including a 3-D torus, a global collective network, and a global barrier network. The design is intended to provide a highly scalable, physically dense system with relatively low power requirements per flop. In this paper, we report on our examination of BG/P, presented in the context of a set of important scientific applications, and as compared to other major large scale supercomputers in use today. Our investigation confirms that BG/P has good scalability with an expected lower performance per processor when compared to the Cray XT4's Opteron. We also find that BG/P uses very low power per floating point operation for certain kernels, yet it has less of a power advantage when considering science-driven metrics for mission applications.