An Evaluation of the Oak Ridge National Laboratory Cray XT3

Authors:
Sadaf R. Alam;Richard F. Barrett;Mark R. Fahey;Jeffery A. Kuehn;O.E. Bronson Messer;Richard T. Mills;Philip C. Roth;Jeffrey S. Vetter;Patrick H. Worley
Affiliations:
OAK RIDGE NATIONAL LABORATORY OAK RIDGE, TENNESSEE, 37831 USA;OAK RIDGE NATIONAL LABORATORY OAK RIDGE, TENNESSEE, 37831 USA;OAK RIDGE NATIONAL LABORATORY OAK RIDGE, TENNESSEE, 37831 USA;OAK RIDGE NATIONAL LABORATORY OAK RIDGE, TENNESSEE, 37831 USA;OAK RIDGE NATIONAL LABORATORY OAK RIDGE, TENNESSEE, 37831 USA;OAK RIDGE NATIONAL LABORATORY OAK RIDGE, TENNESSEE, 37831 USA;OAK RIDGE NATIONAL LABORATORY OAK RIDGE, TENNESSEE, 37831 USA;OAK RIDGE NATIONAL LABORATORY OAK RIDGE, TENNESSEE, 37831 USA,;OAK RIDGE NATIONAL LABORATORY OAK RIDGE, TENNESSEE, 37831 USA
Venue:
International Journal of High Performance Computing Applications
Year:
2008

Citing 15
Cited 8

Fast parallel algorithms for short-range molecular dynamics

Journal of Computational Physics
Synchronization and communication in the T3E multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing
Massively parallel computing using commodity components

Parallel Computing - Parallel computing on clusters of workstations
MPI-The Complete Reference, Volume 1: The MPI Core

MPI-The Complete Reference, Volume 1: The MPI Core
OpenMP: An Industry-Standard API for Shared-Memory Programming

IEEE Computational Science & Engineering
A TeraFLOP Supercomputer in 1996: The ASCI TFLOP System

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Portals 3.0: Protocol Building Blocks for Low Overhead Communication

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
An Eulerian gyrokinetic-Maxwell solver

Journal of Computational Physics
Cplant" Runtime System Support for Multi-Processor and Heterogeneous Compute Nodes

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Performance Evaluation of the Cray X1 Distributed Shared-Memory Architecture

IEEE Micro
Architectural specification for massively parallel computers: an experience and measurement-based approach: Research Articles

Concurrency and Computation: Practice & Experience - The High Performance Architectural Challenge: Mass Market versus Proprietary Components?
Practical performance portability in the Parallel Ocean Program (POP): Research Articles

Concurrency and Computation: Practice & Experience - The High Performance Architectural Challenge: Mass Market versus Proprietary Components?
Performance Evaluation of the SGI Altix 3700

ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Performance characterization of molecular dynamics techniques for biomolecular simulations

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming

Early evaluation of IBM BlueGene/P

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Diagnosing performance bottlenecks in emerging petascale applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Application of self organizing maps for investigating network latency on a broadcast-based distributed shared memory multiprocessor

Expert Systems with Applications: An International Journal
Performance evaluation of directory protocols on an optical broadcast-based distributed shared memory multiprocessor

Computers and Electrical Engineering
Predicting the performance measures of an optical distributed shared memory multiprocessor by using support vector regression

Expert Systems with Applications: An International Journal
Event-driven configuration of a neural network CMP system over an homogeneous interconnect fabric

Parallel Computing
Analyzing the execution of sparse matrix-vector product on the Finisterrae SMP-NUMA system

The Journal of Supercomputing
Application-driven analysis of two generations of capability computing: the transition to multicore processors

Concurrency and Computation: Practice & Experience

Quantified Score

Hi-index	0.00

Visualization

Abstract

In 2005, Oak Ridge National Laboratory (ORNL) received delivery of a 5294 processor Cray XT3. The XT3 is Cray's third-generation massively parallel processing system. The ORNL system uses a single-processor node built around the AMD Opteron and uses a custom chip—called SeaStar—for interprocessor communication. The system uses a lightweight operating system called Catamount on its compute nodes. This paper provides a performance evaluation of the Cray XT3, including measurements for micro-benchmark, kernel, and application benchmarks. In particular, we provide performance results for strategic Department of Energy applications areas including climate, biology, astrophysics, combustion, and fusion. Our results, on up to 4096 processors, demonstrate that the Cray XT3 provides competitive processor performance, high interconnect bandwidth, and high parallel efficiency on a diverse application workload, typical in the DOE Office of Science.