Fat-trees: universal networks for hardware-efficient supercomputing
IEEE Transactions on Computers
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
Performance Analysis of k-ary n-cube Interconnection Networks
IEEE Transactions on Computers
The turn model for adaptive routing
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The SP2 high-performance switch
IBM Systems Journal
The network architecture of the connection machine CM-5
Journal of Parallel and Distributed Computing
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
IEEE Transactions on Parallel and Distributed Systems
The Alpha 21364 Network Architecture
HOTI '01 Proceedings of the The Ninth Symposium on High Performance Interconnects
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Microarchitecture of a High-Radix Router
Proceedings of the 32nd annual international symposium on Computer Architecture
Adaptive routing in high-radix clos network
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Flattened butterfly: a cost-efficient topology for high-radix networks
Proceedings of the 34th annual international symposium on Computer architecture
The Cray BlackWidow: a highly scalable vector multiprocessor
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Technology-Driven, Highly-Scalable Dragonfly Topology
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
High-radix crossbar switches enabled by proximity communication
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Advancing supercomputer performance through interconnection topology synthesis
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Indirect adaptive routing on large scale interconnection networks
Proceedings of the 36th annual international symposium on Computer architecture
Exploring concentration and channel slicing in on-chip network router
NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
HyperX: topology, routing, and packaging of efficient large-scale networks
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Low-cost router microarchitecture for on-chip networks
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Reducing complexity in tree-like computer interconnection networks
Parallel Computing
Energy proportional datacenter networks
Proceedings of the 37th annual international symposium on Computer architecture
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
A first approach to king topologies for on-chip networks
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Automation and Remote Control
A learning-based approach to the automated design of MPSoC networks
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Run-time energy management of manycore systems through reconfigurable interconnects
Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
The role of optics in future high radix switch design
Proceedings of the 38th annual international symposium on Computer architecture
2-Dilated flattened butterfly: A nonblocking switching topology for high-radix networks
Computer Communications
Exploiting communication and packaging locality for cost-effective large scale networks
Proceedings of the 26th ACM international conference on Supercomputing
A micro-architectural analysis of switched photonic multi-chip interconnects
Proceedings of the 39th Annual International Symposium on Computer Architecture
A case for random shortcut topologies for HPC interconnects
Proceedings of the 39th Annual International Symposium on Computer Architecture
Looking under the hood of the IBM blue gene/Q network
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Cray cascade: a scalable HPC system based on a Dragonfly network
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Distributed adaptive routing for big-data applications running on data center networks
Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
The power 775 architecture at scale
Proceedings of the 27th international ACM conference on International conference on supercomputing
Evaluating on-die interconnects for a 4 TB/s router
Proceedings of the 27th international ACM conference on International conference on supercomputing
Distributed full switch as an ideal system area network for multiprocessor computers
Automation and Remote Control
A comparative study of 20-Gb/s NRZ and duobinary signaling using statistical analysis
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Channel reservation protocol for over-subscribed channels and destinations
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Obtaining the optimal configuration of high-radix Combined switches
Journal of Parallel and Distributed Computing
Scalable high-radix router microarchitecture using a network switch organization
ACM Transactions on Architecture and Code Optimization (TACO)
Memory-centric system interconnect design with hybrid memory cubes
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
A synthetic task model for HPC-grade optical network performance evaluation
IA^3 '13 Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
This paper describes the radix-64 folded-Clos network of the Cray BlackWidow scalable vector multiprocessor. We describe the BlackWidow network which scales to 32K processors with a worstcase diameter of seven hops, and the underlying high-radix router microarchitecture and its implementation. By using a high-radix router with many narrow channels we are able to take advantage of the higher pin density and faster signaling rates available in modern ASIC technology. The BlackWidow router is an 800 MHz ASIC with 64 18.75Gb/s bidirectional ports for an aggregate offchip bandwidth of 2.4Tb/s. Each port consists of three 6.25Gb/s differential signals in each direction. The router supports deterministic and adaptive packet routing with separate buffering for request and reply virtual channels. The router is organized hierarchically [13] as an 8脳8 array of tiles which simplifies arbitration by avoiding long wires in the arbiters. Each tile of the array contains a router port, its associated buffering, and an 8脳8 router subswitch. The router ASIC is implemented in a 90nm CMOS standard cell ASIC technology and went from concept to tapeout in 17 months.