Microarchitecture of a High-Radix Router

Authors:
John Kim;William J. Dally;Brian Towles;Amit K. Gupta
Affiliations:
Stanford University;Stanford University;D.E. Shaw Research and Development;Stanford University
Venue:
Proceedings of the 32nd annual international symposium on Computer Architecture
Year:
2005

Citing 18
Cited 39

Performance Analysis of k-ary n-cube Interconnection Networks

IEEE Transactions on Computers
The J-machine multicomputer: an architectural evaluation

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SP2 high-performance switch

IBM Systems Journal
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The network architecture of the connection machine CM-5

Journal of Parallel and Distributed Computing
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Evaluation of crossbar architectures for deadlock recovery routers

Journal of Parallel and Distributed Computing
Architecture and design of AlphaServer GS320

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
High-Speed Electrical Signaling: Overview and Limitations

IEEE Micro
A Delay Model for Router Microarchitectures

IEEE Micro
The Quadrics Network: High-Performance Clustering Technology

IEEE Micro
Limits on Interconnection Network Performance

IEEE Transactions on Parallel and Distributed Systems
The Alpha 21364 Network Architecture

HOTI '01 Proceedings of the The Ninth Symposium on High Performance Interconnects
CMOS High-Speed I/Os - Present and Future

ICCD '03 Proceedings of the 21st International Conference on Computer Design
Power-driven Design of Router Microarchitectures in On-chip Networks

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
Prizma switch technology

IBM Journal of Research and Development
Achieving 100% throughput in an input-queued switch

INFOCOM'96 Proceedings of the Fifteenth annual joint conference of the IEEE computer and communications societies conference on The conference on computer communications - Volume 1

A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks

Proceedings of the 33rd annual international symposium on Computer Architecture
The BlackWidow High-Radix Clos Network

Proceedings of the 33rd annual international symposium on Computer Architecture
Towards an efficient switch architecture for high-radix switches

Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems
Adaptive routing in high-radix clos network

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Flattened butterfly: a cost-efficient topology for high-radix networks

Proceedings of the 34th annual international symposium on Computer architecture
A novel dimensionally-decomposed router for on-chip communication in 3D architectures

Proceedings of the 34th annual international symposium on Computer architecture
Express virtual channels: towards the ideal interconnection fabric

Proceedings of the 34th annual international symposium on Computer architecture
The Cray BlackWidow: a highly scalable vector multiprocessor

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Technology-Driven, Highly-Scalable Dragonfly Topology

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
High-radix crossbar switches enabled by proximity communication

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Token flow control

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Indirect adaptive routing on large scale interconnection networks

Proceedings of the 36th annual international symposium on Computer architecture
Firefly: illuminating future network-on-chip with nanophotonics

Proceedings of the 36th annual international symposium on Computer architecture
Design and performance of speculative flow control for high-radix datacenter interconnect switches

Journal of Parallel and Distributed Computing
HyperX: topology, routing, and packaging of efficient large-scale networks

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Silicon-photonic network architectures for scalable, power-efficient multi-chip systems

Proceedings of the 37th annual international symposium on Computer architecture
Energy proportional datacenter networks

Proceedings of the 37th annual international symposium on Computer architecture
Asynchronous Bypass Channels: Improving Performance for Multi-synchronous NoCs

NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
A 128 x 128 x 24Gb/s Crossbar Interconnecting 128 Tiles in a Single Hop and Occupying 6% of Their Area

NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Design of High-Radix Clos Network-on-Chip

NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Design and implementation of high-speed buffered crossbars with efficient load balancing for multi-core SoCs

Microprocessors & Microsystems
Design of a scalable nanophotonic interconnect for future multicores

Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Throughput-Effective On-Chip Networks for Manycore Accelerators

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
VLSI micro-architectures for high-radix crossbar schedulers

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
The role of optics in future high radix switch design

Proceedings of the 38th annual international symposium on Computer architecture
Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation

PADS '11 Proceedings of the 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation
Saturating the transceiver bandwidth: switch fabric design on FPGAs

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
FeatherWeight: low-cost optical arbitration with QoS support

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Enhancing effective throughput for transmission line-based bus

Proceedings of the 39th Annual International Symposium on Computer Architecture
A latency-optimized hybrid network for clustering FPGAs (abstract only)

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Evaluating on-die interconnects for a 4 TB/s router

Proceedings of the 27th international ACM conference on International conference on supercomputing
Obtaining the optimal configuration of high-radix Combined switches

Journal of Parallel and Distributed Computing
Silicon-aware distributed switch architecture for on-chip networks

Journal of Systems Architecture: the EUROMICRO Journal
Designing on-chip networks for throughput accelerators

ACM Transactions on Architecture and Code Optimization (TACO)
Scalable high-radix router microarchitecture using a network switch organization

ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting emerging technologies for nanoscale photonic networks-on-chip

Proceedings of the Sixth International Workshop on Network on Chip Architectures
Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Evolving semiconductor and circuit technology has greatly increased the pin bandwidth available to a router chip. In the early 90s, routers were limited to 10Gb/s of pin bandwidth. Today 1Tb/s is feasible, and we expect 20Tb/s of I/O bandwidth by 2010. A high-radix router that provides many narrow ports is more effective in converting pin bandwidth to reduced latency and reduced cost than the alternative of building a router with a few wide ports. However, increasing the radix (or degree) of a router raises several challenges as internal switches and allocators scale as the square of the radix. This paper addresses these challenges by proposing and evaluating alternative microarchitectures for high radix routers. We show that the use of a hierarchical switch organization with per-virtual-channel buffers in each subswitch enables an area savings of 40% compared to a fully buffered crossbar and a throughput increase of 20-60% compared to a conventional crossbar implementation.