The BlackWidow High-Radix Clos Network

Authors:
Steve Scott;Dennis Abts;John Kim;William J. Dally
Affiliations:
Cray Inc., Chippewa Falls,Wisconsin;Cray Inc., Chippewa Falls,Wisconsin;Stanford University;Stanford University
Venue:
Proceedings of the 33rd annual international symposium on Computer Architecture
Year:
2006

Citing 11
Cited 35

Fat-trees: universal networks for hardware-efficient supercomputing

IEEE Transactions on Computers
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks

IEEE Transactions on Computers
Performance Analysis of k-ary n-cube Interconnection Networks

IEEE Transactions on Computers
The turn model for adaptive routing

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The SP2 high-performance switch

IBM Systems Journal
The network architecture of the connection machine CM-5

Journal of Parallel and Distributed Computing
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Virtual-Channel Flow Control

IEEE Transactions on Parallel and Distributed Systems
The Alpha 21364 Network Architecture

HOTI '01 Proceedings of the The Ninth Symposium on High Performance Interconnects
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
Microarchitecture of a High-Radix Router

Proceedings of the 32nd annual international symposium on Computer Architecture

Adaptive routing in high-radix clos network

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Flattened butterfly: a cost-efficient topology for high-radix networks

Proceedings of the 34th annual international symposium on Computer architecture
The Cray BlackWidow: a highly scalable vector multiprocessor

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Technology-Driven, Highly-Scalable Dragonfly Topology

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
High-radix crossbar switches enabled by proximity communication

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Advancing supercomputer performance through interconnection topology synthesis

Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Indirect adaptive routing on large scale interconnection networks

Proceedings of the 36th annual international symposium on Computer architecture
Exploring concentration and channel slicing in on-chip network router

NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
HyperX: topology, routing, and packaging of efficient large-scale networks

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Low-cost router microarchitecture for on-chip networks

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Reducing complexity in tree-like computer interconnection networks

Parallel Computing
Energy proportional datacenter networks

Proceedings of the 37th annual international symposium on Computer architecture
A 128 x 128 x 24Gb/s Crossbar Interconnecting 128 Tiles in a Single Hop and Occupying 6% of Their Area

NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
A first approach to king topologies for on-chip networks

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
An invariant extension method for system area networks of multicore computational systems. An ideal system network

Automation and Remote Control
A learning-based approach to the automated design of MPSoC networks

ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Run-time energy management of manycore systems through reconfigurable interconnects

Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
The role of optics in future high radix switch design

Proceedings of the 38th annual international symposium on Computer architecture
2-Dilated flattened butterfly: A nonblocking switching topology for high-radix networks

Computer Communications
Exploiting communication and packaging locality for cost-effective large scale networks

Proceedings of the 26th ACM international conference on Supercomputing
A micro-architectural analysis of switched photonic multi-chip interconnects

Proceedings of the 39th Annual International Symposium on Computer Architecture
A case for random shortcut topologies for HPC interconnects

Proceedings of the 39th Annual International Symposium on Computer Architecture
Looking under the hood of the IBM blue gene/Q network

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Cray cascade: a scalable HPC system based on a Dragonfly network

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Distributed adaptive routing for big-data applications running on data center networks

Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
The power 775 architecture at scale

Proceedings of the 27th international ACM conference on International conference on supercomputing
Evaluating on-die interconnects for a 4 TB/s router

Proceedings of the 27th international ACM conference on International conference on supercomputing
Distributed full switch as an ideal system area network for multiprocessor computers

Automation and Remote Control
A comparative study of 20-Gb/s NRZ and duobinary signaling using statistical analysis

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Channel reservation protocol for over-subscribed channels and destinations

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Obtaining the optimal configuration of high-radix Combined switches

Journal of Parallel and Distributed Computing
Scalable high-radix router microarchitecture using a network switch organization

ACM Transactions on Architecture and Code Optimization (TACO)
Memory-centric system interconnect design with hybrid memory cubes

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
A synthetic task model for HPC-grade optical network performance evaluation

IA^3 '13 Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the radix-64 folded-Clos network of the Cray BlackWidow scalable vector multiprocessor. We describe the BlackWidow network which scales to 32K processors with a worstcase diameter of seven hops, and the underlying high-radix router microarchitecture and its implementation. By using a high-radix router with many narrow channels we are able to take advantage of the higher pin density and faster signaling rates available in modern ASIC technology. The BlackWidow router is an 800 MHz ASIC with 64 18.75Gb/s bidirectional ports for an aggregate offchip bandwidth of 2.4Tb/s. Each port consists of three 6.25Gb/s differential signals in each direction. The router supports deterministic and adaptive packet routing with separate buffering for request and reply virtual channels. The router is organized hierarchically [13] as an 8脳8 array of tiles which simplifies arbitration by avoiding long wires in the arbiters. Each tile of the array contains a router port, its associated buffering, and an 8脳8 router subswitch. The router ASIC is implemented in a 90nm CMOS standard cell ASIC technology and went from concept to tapeout in 17 months.