Performance Evaluation of the Cray X1 Distributed Shared-Memory Architecture

Authors:
Thomas H. Dunigan Jr.;Jeffrey S. Vetter;James B. White III;Patrick H. Worley
Affiliations:
Oak Ridge National Laboratory;Oak Ridge National Laboratory;Oak Ridge National Laboratory;Oak Ridge National Laboratory
Venue:
IEEE Micro
Year:
2005

Citing 8
Cited 11

Synchronization and communication in the T3E multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Co-array Fortran for parallel programming

ACM SIGPLAN Fortran Forum
Global arrays: a portable "shared-memory" programming model for distributed memory computers

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
A 26.58 Tflops global atmospheric simulation with the spectral transform method on the Earth Simulator

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
An Eulerian gyrokinetic-Maxwell solver

Journal of Computational Physics
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
Early Evaluation of the Cray X1

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Practical performance portability in the Parallel Ocean Program (POP): Research Articles

Concurrency and Computation: Practice & Experience - The High Performance Architectural Challenge: Mass Market versus Proprietary Components?

Leading Computational Methods on Scalar and Vector HEC Platforms

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Scalable Cache Miss Handling for High Memory-Level Parallelism

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
An on-chip cache design for vector processors

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
An Evaluation of the Oak Ridge National Laboratory Cray XT3

International Journal of High Performance Computing Applications
Scientific Application Performance On Leading Scalar and Vector Supercomputering Platforms

International Journal of High Performance Computing Applications
A shared cache for a chip multi vector processor

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Performance tuning and analysis of future vector processors based on the roofline model

Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
Evaluating error detection capabilities of UPC run-time systems

Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
Early evaluation of the cray XT3

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Performance and scalability analysis of cray x1 vectorization and multistreaming optimization

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Cray X1 supercomputer's distributed shared memory presents a 64-bit global address space that is directly addressable from every MSP with an interconnect bandwidth per computation rate of 1 byte/flop. Our results show that this high bandwidth and low latency for remote memory accesses translate into improved application performance on important applications.