Cray XT4: an early evaluation for petascale scientific simulation

Authors:
Sadaf R. Alam;Jeffery A. Kuehn;Richard F. Barrett;Jeff M. Larkin;Mark R. Fahey;Ramanan Sankaran;Patrick H. Worley
Affiliations:
Oak Ridge National Laboratory, Oak Ridge, Tennessee;Oak Ridge National Laboratory, Oak Ridge, Tennessee;Oak Ridge National Laboratory, Oak Ridge, Tennessee;Cray Inc, Seattle, Washington;Oak Ridge National Laboratory, Oak Ridge, Tennessee;Oak Ridge National Laboratory, Oak Ridge, Tennessee;Oak Ridge National Laboratory, Oak Ridge, Tennessee
Venue:
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Year:
2007

Citing 8
Cited 15

s-step iterative methods for symmetric linear systems

Journal of Computational and Applied Mathematics
Co-array Fortran for parallel programming

ACM SIGPLAN Fortran Forum
Low-storage, explicit Runge-Kutta schemes for the compressible Navier-Stokes equations

Applied Numerical Mathematics
OpenMP: An Industry-Standard API for Shared-Memory Programming

IEEE Computational Science & Engineering
Practical performance portability in the Parallel Ocean Program (POP): Research Articles

Concurrency and Computation: Practice & Experience - The High Performance Architectural Challenge: Mass Market versus Proprietary Components?
Anatomy of high-performance matrix multiplication

ACM Transactions on Mathematical Software (TOMS)
Achieving strong scaling with NAMD on blue Gene/L

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Early evaluation of the cray XT3

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Early evaluation of IBM BlueGene/P

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Impact of Quad-Core Cray XT4 System and Software Stack on Scientific Computation

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Diagnosing performance bottlenecks in emerging petascale applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Efficient object storage journaling in a distributed parallel file system

FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Exploiting 162-Nanosecond End-to-End Communication Latency on Anton

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Optimal Utilization of Heterogeneous Resources for Biomolecular Simulations

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Understanding Application Performance via Micro-benchmarks on Three Large Supercomputers: Intrepid, Ranger and Jaguar

International Journal of High Performance Computing Applications
Minimal-overhead virtualization of a large scale supercomputer

Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Event-driven configuration of a neural network CMP system over an homogeneous interconnect fabric

Parallel Computing
Indirect cube: A power-efficient topology for compute clusters

Optical Switching and Networking
Portable explicit threading and concurrent programming for MPI applications

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
Application-driven analysis of two generations of capability computing: the transition to multicore processors

Concurrency and Computation: Practice & Experience
Concurrent programming constructs for parallel MPI applications

The Journal of Supercomputing
Computer performance analysis and the Pi Theorem

Computer Science - Research and Development
The Experience in Designing and Evaluating the High Performance Cluster Netuno

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

The scientific simulation capabilities of next generation high-end computing technology will depend on striking a balance among memory, processor, I/O, and local and global network performance across the breadth of the scientific simulation space. The Cray XT4 combines commodity AMD dual core Opteron processor technology with the second generation of Cray's custom communication accelerator in a system design whose balance is claimed to be driven by the demands of scientific simulation. This paper presents an evaluation of the Cray XT4 using micro-benchmarks to develop a controlled understanding of individual system components, providing the context for analyzing and comprehending the performance of several petascale-ready applications. Results gathered from several strategic application domains are compared with observations on the previous generation Cray XT3 and other high-end computing systems, demonstrating performance improvements across a wide variety of application benchmark problems.