The Cray BlackWidow: a highly scalable vector multiprocessor

Authors:
Dennis Abts;Abdulla Bataineh;Steve Scott;Greg Faanes;Jim Schwarzmeier;Eric Lundberg;Tim Johnson;Mike Bye;Gerald Schwoerer
Affiliations:
Cray Inc., Chippewa Falls, Wisconsin;Cray Inc., Chippewa Falls, Wisconsin;Cray Inc., Chippewa Falls, Wisconsin;Cray Inc., Chippewa Falls, Wisconsin;Cray Inc., Chippewa Falls, Wisconsin;Cray Inc., Chippewa Falls, Wisconsin;Cray Inc., Chippewa Falls, Wisconsin;Cray Inc., Chippewa Falls, Wisconsin;Cray Inc., Chippewa Falls, Wisconsin
Venue:
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Year:
2007

Citing 12
Cited 16

Tarantula: a vector extension to the alpha architecture

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A Simulation Study of Decoupled Vector Architectures

The Journal of Supercomputing
Decoupled access/execute computer architectures

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
So Many States, So Little Time: Verifying Memory Coherence in the Cray X1

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Performance characteristics of the Cray X1 and their implications for application performance tuning

Proceedings of the 18th annual international conference on Supercomputing
Evaluating support for global address space languages on the Cray X1

Proceedings of the 18th annual international conference on Supercomputing
Cache Refill/Access Decoupling for Vector Machines

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
The Soft Error Problem: An Architectural Perspective

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Early Evaluation of the Cray X1

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Microarchitecture of a High-Radix Router

Proceedings of the 32nd annual international symposium on Computer Architecture
Leading Computational Methods on Scalar and Vector HEC Platforms

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
The BlackWidow High-Radix Clos Network

Proceedings of the 33rd annual international symposium on Computer Architecture

Technology-Driven, Highly-Scalable Dragonfly Topology

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Atomic Vector Operations on Chip Multiprocessors

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Benchmarking GPUs to tune dense linear algebra

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A shared cache for a chip multi vector processor

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Performance tuning and analysis of future vector processors based on the roofline model

Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
Performance evaluation of NEC SX-9 using real science and engineering applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
On-Chip Network Evaluation Framework

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Adaptive granularity memory systems: a tradeoff between storage efficiency and throughput

Proceedings of the 38th annual international symposium on Computer architecture
Exploiting communication and packaging locality for cost-effective large scale networks

Proceedings of the 26th ACM international conference on Supercomputing
The dynamic granularity memory system

Proceedings of the 39th Annual International Symposium on Computer Architecture
Cray cascade: a scalable HPC system based on a Dragonfly network

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Vector Extensions for Decision Support DBMS Acceleration

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
The power 775 architecture at scale

Proceedings of the 27th international ACM conference on International conference on supercomputing
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators

ACM Transactions on Computer Systems (TOCS)
Scalable high-radix router microarchitecture using a network switch organization

ACM Transactions on Architecture and Code Optimization (TACO)
A locality-aware memory hierarchy for energy-efficient GPU architectures

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the system architecture of the Cray BlackWidow scalable vector multiprocessor. The BlackWidow system is a distributed shared memory (DSM) architecture that is scalable to 32K processors, each with a 4-way dispatch scalar execution unit and an 8-pipe vector unit capable of 20.8 Gflops for 64-bit operations and 41.6 Gflops for 32-bit operations at the prototype operating frequency of 1.3 GHz. Global memory is directly accessible with processor loads and stores and is globally coherent. The system supports thousands of outstanding references to hide remote memory latencies, and provides a rich suite of built-in synchronization primitives. Each BlackWidow node is implemented as a 4-way SMP with up to 128 Gbytes of DDR2 main memory capacity. The system supports common programming models such as MPI and OpenMP, as well as global address space languages such as UPC and CAF. We describe the system architecture and microarchitecture of the processor, memory controller, and router chips. We give preliminary performance results and discuss design tradeoffs.