Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Synchronization and communication in the T3E multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
IEEE Micro
Performance tuning and evaluation of a parallel community climate model
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Data Locality Exploitation in the Decomposition of Regular Domain Problems
IEEE Transactions on Parallel and Distributed Systems
An Analytical Model of Adaptive Wormhole Routing in Hypercubes in the Presence of Hot Spot Traffic
IEEE Transactions on Parallel and Distributed Systems
The Journal of Supercomputing
A Comparative Study of Switching Methods in Multicomputer Networks
The Journal of Supercomputing
The Journal of Supercomputing
Hypermeshes: implementation and performance
Journal of Systems Architecture: the EUROMICRO Journal
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Impact of PE Mapping on Cray T3E Message-Passing Performance
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Message Passing Evaluation and Analysis on Cray T3E and SGI Origin 2000 Systems
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Analysis of k-ary n-cubes with dimension-ordered routing
Future Generation Computer Systems - Selected papers from CCGRID 2002
Communication Delay in Wormhole-Switched Tori Networks under Bursty Workloads
The Journal of Supercomputing
Higher dimensional hexagonal networks
Journal of Parallel and Distributed Computing
Modeling Latency in Deterministic Wormhole-Routed Hypercubes under Hot-Spot Traffic
The Journal of Supercomputing
Analysis of true fully adaptive routing with software-based deadlock recovery
Journal of Systems and Software - Special issue: Computer systems
On the performance of multicomputer interconnection networks
Journal of Systems Architecture: the EUROMICRO Journal
Optical transpose k-ary n-cube networks
Journal of Systems Architecture: the EUROMICRO Journal
The Effect of Virtual Channel Organization on the Performance of Interconnection Networks
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
Construction of maximum cycles in faulty binary hypercubes
Automation and Remote Control
International Journal of High Performance Computing Applications
Design and Evaluation of an HPVM-Based Windows NT Supercomputer
International Journal of High Performance Computing Applications
A performance model of compressionless routing in k-ary n-cube networks
Performance Evaluation
Performance comparison of routing algorithms in wormhole-switched networks
Parallel Computing
Microprocessors & Microsystems
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Parallel Lagrange interpolation on k-ary n-cubes with maximum channel utilization
The Journal of Supercomputing
Multicast communication in wormhole-routed 2D torus networks with hamiltonian cycle model
Journal of Systems Architecture: the EUROMICRO Journal
Computers and Electrical Engineering
Resource placement in three-dimensional tori
Parallel Computing
Processor allocation and job scheduling on 3D mesh interconnection networks
International Journal of Computers and Applications
Information Sciences: an International Journal
Parallel algorithms for finding polynomial Roots on OTIS-torus
The Journal of Supercomputing
Hamiltonian cycles passing through linear forests in k-ary n-cubes
Discrete Applied Mathematics
One-to-one disjoint path covers on k-ary n-cubes
Theoretical Computer Science
X-torus: a variation of torus topology with lower diameter and larger bisection width
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
Choice of inner switching mechanisms in terabit router
ICN'05 Proceedings of the 4th international conference on Networking - Volume Part I
Fault-free Hamiltonian cycles passing through a linear forest in ternary n-cubes with faulty edges
Theoretical Computer Science
Strong matching preclusion for k-ary n-cubes
Discrete Applied Mathematics
Hi-index | 0.00 |
The CRAY T3E is a scalable shared-memory multiprocessor based on the DEC Alpha 21164 microprocessor. The system includes a number of novel architectural features designed to tolerate latency, enhance scalability, and deliver high performance on scientific and engineering codes. Included among these are stream buffers, which detect and prefetch down small-stride reference streams, E-registers, which provide latency hiding and non-unit-stride access capabilities, barrier and fetch_and_op synchronization support, and a scalable, high-bandwidth interconnection network.This paper reports our experiences with the CRAY T3E and presents a variety of performance measurements. Section 2 provides a brief overview of the system architecture. Section 3 describes the latency-hiding features (caches, stream buffers and E-registers) in more detail, assesses their performance impact, and discusses coding techniques for using them. Section 4 presents single-processor performance results. Finally, Section 5 discusses system scalability.