A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Low-storage, explicit Runge-Kutta schemes for the compressible Navier-Stokes equations
Applied Numerical Mathematics
A scalable cross-platform infrastructure for application performance tuning using hardware counters
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Cray XT4: an early evaluation for petascale scientific simulation
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Early evaluation of IBM BlueGene/P
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Overview of the Blue Gene/L system architecture
IBM Journal of Research and Development
Early evaluation of the cray XT3
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
An upgrade from dual-core to quad-core AMD processor on the Cray XT system at the Oak Ridge National Laboratory (ORNL) Leadership Computing Facility (LCF) has resulted in significant changes in the hardware and software stack, including a deeper memory hierarchy, SIMD instructions and a multi-core aware MPI library. In this paper, we evaluate impact of a subset of these key changes on large-scale scientific applications. We will provide insights into application tuning and optimization process and report on how different strategies yield varying rates of successes and failures across different application domains. For instance, we demonstrate that the vectorization instructions (SSE) provide a performance boost of as much as 50% on fusion and combustion applications. Moreover, we reveal how the resource contentions could limit the achievable performance and provide insights into how application could exploit Petascale XT5 system's hierarchical parallelism.