Performance of the CRAY T3E multiprocessor

  • Authors:
  • Ed Anderson;Jeff Brooks;Charles Grassl;Steve Scott

  • Affiliations:
  • Cray Research, Inc.;Cray Research, Inc.;Cray Research, Inc.;Cray Research, Inc.

  • Venue:
  • SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

The CRAY T3E is a scalable shared-memory multiprocessor based on the DEC Alpha 21164 microprocessor. The system includes a number of novel architectural features designed to tolerate latency, enhance scalability, and deliver high performance on scientific and engineering codes. Included among these are stream buffers, which detect and prefetch down small-stride reference streams, E-registers, which provide latency hiding and non-unit-stride access capabilities, barrier and fetch_and_op synchronization support, and a scalable, high-bandwidth interconnection network.This paper reports our experiences with the CRAY T3E and presents a variety of performance measurements. Section 2 provides a brief overview of the system architecture. Section 3 describes the latency-hiding features (caches, stream buffers and E-registers) in more detail, assesses their performance impact, and discusses coding techniques for using them. Section 4 presents single-processor performance results. Finally, Section 5 discusses system scalability.