Fast parallel algorithms for short-range molecular dynamics
Journal of Computational Physics
Multigrid
Statistical scalability analysis of communication operations in distributed applications
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
A Portable Programming Interface for Performance Evaluation on Modern Processors
International Journal of High Performance Computing Applications
International Journal of High Performance Computing Applications
An Introduction to Algebraic Multigrid
Computing in Science and Engineering
Scientific applications vs. SPEC-FP: a comparison of program behavior
Proceedings of the 20th annual international conference on Supercomputing
MPInside: a performance analysis and diagnostic tool for MPI applications
Proceedings of the first joint WOSP/SIPEW international conference on Performance engineering
Modeling the performance of an algebraic multigrid cycle on HPC platforms
Proceedings of the international conference on Supercomputing
A Performance Model of Direct Numerical Simulation for Analyzing Large-Scale Systems
IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Challenges of Scaling Algebraic Multigrid Across Modern Multicore Architectures
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Argonne applications for the IBM Blue Gene/Q, Mira
IBM Journal of Research and Development
Hi-index | 0.00 |
The IBM Blue Gene®/Q platform presents scientists and engineers with a rich set of hardware features such as 16 cores per chip sharing a Level 2 cache, a wide SIMD (single-instruction, multiple-data) unit, a five-dimensional torus network, and hardware support for collective operations. An especially important feature is that the cores have four "hardware threads," which makes it possible to hide latencies and obtain a high fraction of the peak issue rate from each core. All of these hardware resources present unique performance-tuning opportunities on Blue Gene/Q. We provide an overview of several important applications and solvers and study them on Blue Gene/Q using performance counters and Message Passing Interface profiles. We discuss how Blue Gene/Q tools help us understand the interaction of the application with the hardware and software layers and provide guidance for optimization. On the basis of our analysis, we discuss code improvement strategies targeting Blue Gene/Q. Information about how these algorithms map to the Blue Gene® architecture is expected to have an impact on future system design as we move to the exascale era.