The Cray BlackWidow: a highly scalable vector multiprocessor

  • Authors:
  • Dennis Abts;Abdulla Bataineh;Steve Scott;Greg Faanes;Jim Schwarzmeier;Eric Lundberg;Tim Johnson;Mike Bye;Gerald Schwoerer

  • Affiliations:
  • Cray Inc., Chippewa Falls, Wisconsin;Cray Inc., Chippewa Falls, Wisconsin;Cray Inc., Chippewa Falls, Wisconsin;Cray Inc., Chippewa Falls, Wisconsin;Cray Inc., Chippewa Falls, Wisconsin;Cray Inc., Chippewa Falls, Wisconsin;Cray Inc., Chippewa Falls, Wisconsin;Cray Inc., Chippewa Falls, Wisconsin;Cray Inc., Chippewa Falls, Wisconsin

  • Venue:
  • Proceedings of the 2007 ACM/IEEE conference on Supercomputing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the system architecture of the Cray BlackWidow scalable vector multiprocessor. The BlackWidow system is a distributed shared memory (DSM) architecture that is scalable to 32K processors, each with a 4-way dispatch scalar execution unit and an 8-pipe vector unit capable of 20.8 Gflops for 64-bit operations and 41.6 Gflops for 32-bit operations at the prototype operating frequency of 1.3 GHz. Global memory is directly accessible with processor loads and stores and is globally coherent. The system supports thousands of outstanding references to hide remote memory latencies, and provides a rich suite of built-in synchronization primitives. Each BlackWidow node is implemented as a 4-way SMP with up to 128 Gbytes of DDR2 main memory capacity. The system supports common programming models such as MPI and OpenMP, as well as global address space languages such as UPC and CAF. We describe the system architecture and microarchitecture of the processor, memory controller, and router chips. We give preliminary performance results and discuss design tradeoffs.