Synchronization and communication in the T3E multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
Global arrays: a portable "shared-memory" programming model for distributed memory computers
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
An Eulerian gyrokinetic-Maxwell solver
Journal of Computational Physics
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Early Evaluation of the Cray X1
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Practical performance portability in the Parallel Ocean Program (POP): Research Articles
Concurrency and Computation: Practice & Experience - The High Performance Architectural Challenge: Mass Market versus Proprietary Components?
Leading Computational Methods on Scalar and Vector HEC Platforms
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Scalable Cache Miss Handling for High Memory-Level Parallelism
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
An on-chip cache design for vector processors
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
An Evaluation of the Oak Ridge National Laboratory Cray XT3
International Journal of High Performance Computing Applications
Scientific Application Performance On Leading Scalar and Vector Supercomputering Platforms
International Journal of High Performance Computing Applications
A shared cache for a chip multi vector processor
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Performance tuning and analysis of future vector processors based on the roofline model
Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
Evaluating error detection capabilities of UPC run-time systems
Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
Early evaluation of the cray XT3
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Performance and scalability analysis of cray x1 vectorization and multistreaming optimization
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part I
Hi-index | 0.00 |
The Cray X1 supercomputer's distributed shared memory presents a 64-bit global address space that is directly addressable from every MSP with an interconnect bandwidth per computation rate of 1 byte/flop. Our results show that this high bandwidth and low latency for remote memory accesses translate into improved application performance on important applications.