s-step iterative methods for symmetric linear systems
Journal of Computational and Applied Mathematics
Parallel ocean general circulation modeling
Proceedings of the eleventh annual international conference of the Center for Nonlinear Studies on Experimental mathematics : computational issues in nonlinear science: computational issues in nonlinear science
Fast parallel algorithms for short-range molecular dynamics
Journal of Computational Physics
Low-storage, explicit Runge-Kutta schemes for the compressible Navier-Stokes equations
Applied Numerical Mathematics
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
NAMD: biomolecular simulation on thousands of processors
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
An Eulerian gyrokinetic-Maxwell solver
Journal of Computational Physics
Practical performance portability in the Parallel Ocean Program (POP): Research Articles
Concurrency and Computation: Practice & Experience - The High Performance Architectural Challenge: Mass Market versus Proprietary Components?
Performance Portability in the Physical Parameterizations of the Community Atmospheric Model
International Journal of High Performance Computing Applications
A Scalable Implementation of a Finite-Volume Dynamical Core in the Community Atmosphere Model
International Journal of High Performance Computing Applications
An Evaluation of the Oak Ridge National Laboratory Cray XT3
International Journal of High Performance Computing Applications
Cray XT4: an early evaluation for petascale scientific simulation
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
IBM System Blue Gene Solution: Blue Gene/P Application Development
IBM System Blue Gene Solution: Blue Gene/P Application Development
Impact of Quad-Core Cray XT4 System and Software Stack on Scientific Computation
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Diagnosing performance bottlenecks in emerging petascale applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
The Importance of Non-Data-Communication Overheads in MPI
International Journal of High Performance Computing Applications
Exploiting 162-Nanosecond End-to-End Communication Latency on Anton
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Experiences with a Lightweight Supercomputer Kernel: Lessons Learned from Blue Gene's CNK
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
International Journal of High Performance Computing Applications
Collective algorithms for sub-communicators
Proceedings of the 26th ACM international conference on Supercomputing
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Journal of Parallel and Distributed Computing
3-Dimensional root cause diagnosis via co-analysis
Proceedings of the 9th international conference on Autonomic computing
Supercomputing with commodity CPUs: are mobile SoCs ready for HPC?
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Data decomposition of Monte Carlo particle transport simulations via tally servers
Journal of Computational Physics
Scalable model of parallel computations for applications with intensive input-output
Journal of Computer and Systems Sciences International
The Experience in Designing and Evaluating the High Performance Cluster Netuno
International Journal of Parallel Programming
Hi-index | 0.00 |
BlueGene/P (BG/P) is the second generation BlueGene architecture from IBM, succeeding BlueGene/L (BG/L). BG/P is a system-on-a-chip (SoC) design that uses four PowerPC 450 cores operating at 850 MHz with a double precision, dual pipe floating point unit per core. These chips are connected with multiple interconnection networks including a 3-D torus, a global collective network, and a global barrier network. The design is intended to provide a highly scalable, physically dense system with relatively low power requirements per flop. In this paper, we report on our examination of BG/P, presented in the context of a set of important scientific applications, and as compared to other major large scale supercomputers in use today. Our investigation confirms that BG/P has good scalability with an expected lower performance per processor when compared to the Cray XT4's Opteron. We also find that BG/P uses very low power per floating point operation for certain kernels, yet it has less of a power advantage when considering science-driven metrics for mission applications.