s-step iterative methods for symmetric linear systems
Journal of Computational and Applied Mathematics
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
Low-storage, explicit Runge-Kutta schemes for the compressible Navier-Stokes equations
Applied Numerical Mathematics
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
Practical performance portability in the Parallel Ocean Program (POP): Research Articles
Concurrency and Computation: Practice & Experience - The High Performance Architectural Challenge: Mass Market versus Proprietary Components?
Anatomy of high-performance matrix multiplication
ACM Transactions on Mathematical Software (TOMS)
Achieving strong scaling with NAMD on blue Gene/L
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Early evaluation of the cray XT3
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Early evaluation of IBM BlueGene/P
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Impact of Quad-Core Cray XT4 System and Software Stack on Scientific Computation
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Diagnosing performance bottlenecks in emerging petascale applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Efficient object storage journaling in a distributed parallel file system
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Exploiting 162-Nanosecond End-to-End Communication Latency on Anton
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Optimal Utilization of Heterogeneous Resources for Biomolecular Simulations
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
International Journal of High Performance Computing Applications
Minimal-overhead virtualization of a large scale supercomputer
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Indirect cube: A power-efficient topology for compute clusters
Optical Switching and Networking
Portable explicit threading and concurrent programming for MPI applications
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
Concurrency and Computation: Practice & Experience
Concurrent programming constructs for parallel MPI applications
The Journal of Supercomputing
Computer performance analysis and the Pi Theorem
Computer Science - Research and Development
The Experience in Designing and Evaluating the High Performance Cluster Netuno
International Journal of Parallel Programming
Hi-index | 0.00 |
The scientific simulation capabilities of next generation high-end computing technology will depend on striking a balance among memory, processor, I/O, and local and global network performance across the breadth of the scientific simulation space. The Cray XT4 combines commodity AMD dual core Opteron processor technology with the second generation of Cray's custom communication accelerator in a system design whose balance is claimed to be driven by the demands of scientific simulation. This paper presents an evaluation of the Cray XT4 using micro-benchmarks to develop a controlled understanding of individual system components, providing the context for analyzing and comprehending the performance of several petascale-ready applications. Results gathered from several strategic application domains are compared with observations on the previous generation Cray XT3 and other high-end computing systems, demonstrating performance improvements across a wide variety of application benchmark problems.