Performance Analysis of k-ary n-cube Interconnection Networks
IEEE Transactions on Computers
The J-machine multicomputer: an architectural evaluation
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SP2 high-performance switch
IBM Systems Journal
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The network architecture of the connection machine CM-5
Journal of Parallel and Distributed Computing
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Evaluation of crossbar architectures for deadlock recovery routers
Journal of Parallel and Distributed Computing
Architecture and design of AlphaServer GS320
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
A Delay Model for Router Microarchitectures
IEEE Micro
Limits on Interconnection Network Performance
IEEE Transactions on Parallel and Distributed Systems
The Alpha 21364 Network Architecture
HOTI '01 Proceedings of the The Ninth Symposium on High Performance Interconnects
CMOS High-Speed I/Os - Present and Future
ICCD '03 Proceedings of the 21st International Conference on Computer Design
Power-driven Design of Router Microarchitectures in On-chip Networks
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
IBM Journal of Research and Development
Achieving 100% throughput in an input-queued switch
INFOCOM'96 Proceedings of the Fifteenth annual joint conference of the IEEE computer and communications societies conference on The conference on computer communications - Volume 1
A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks
Proceedings of the 33rd annual international symposium on Computer Architecture
The BlackWidow High-Radix Clos Network
Proceedings of the 33rd annual international symposium on Computer Architecture
Towards an efficient switch architecture for high-radix switches
Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems
Adaptive routing in high-radix clos network
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Flattened butterfly: a cost-efficient topology for high-radix networks
Proceedings of the 34th annual international symposium on Computer architecture
A novel dimensionally-decomposed router for on-chip communication in 3D architectures
Proceedings of the 34th annual international symposium on Computer architecture
Express virtual channels: towards the ideal interconnection fabric
Proceedings of the 34th annual international symposium on Computer architecture
The Cray BlackWidow: a highly scalable vector multiprocessor
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Technology-Driven, Highly-Scalable Dragonfly Topology
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
High-radix crossbar switches enabled by proximity communication
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Indirect adaptive routing on large scale interconnection networks
Proceedings of the 36th annual international symposium on Computer architecture
Firefly: illuminating future network-on-chip with nanophotonics
Proceedings of the 36th annual international symposium on Computer architecture
Design and performance of speculative flow control for high-radix datacenter interconnect switches
Journal of Parallel and Distributed Computing
HyperX: topology, routing, and packaging of efficient large-scale networks
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Silicon-photonic network architectures for scalable, power-efficient multi-chip systems
Proceedings of the 37th annual international symposium on Computer architecture
Energy proportional datacenter networks
Proceedings of the 37th annual international symposium on Computer architecture
Asynchronous Bypass Channels: Improving Performance for Multi-synchronous NoCs
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Design of High-Radix Clos Network-on-Chip
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Microprocessors & Microsystems
Design of a scalable nanophotonic interconnect for future multicores
Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Throughput-Effective On-Chip Networks for Manycore Accelerators
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
VLSI micro-architectures for high-radix crossbar schedulers
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
The role of optics in future high radix switch design
Proceedings of the 38th annual international symposium on Computer architecture
Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation
PADS '11 Proceedings of the 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation
Saturating the transceiver bandwidth: switch fabric design on FPGAs
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
FeatherWeight: low-cost optical arbitration with QoS support
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Enhancing effective throughput for transmission line-based bus
Proceedings of the 39th Annual International Symposium on Computer Architecture
A latency-optimized hybrid network for clustering FPGAs (abstract only)
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Evaluating on-die interconnects for a 4 TB/s router
Proceedings of the 27th international ACM conference on International conference on supercomputing
Obtaining the optimal configuration of high-radix Combined switches
Journal of Parallel and Distributed Computing
Silicon-aware distributed switch architecture for on-chip networks
Journal of Systems Architecture: the EUROMICRO Journal
Designing on-chip networks for throughput accelerators
ACM Transactions on Architecture and Code Optimization (TACO)
Scalable high-radix router microarchitecture using a network switch organization
ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting emerging technologies for nanoscale photonic networks-on-chip
Proceedings of the Sixth International Workshop on Network on Chip Architectures
Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Evolving semiconductor and circuit technology has greatly increased the pin bandwidth available to a router chip. In the early 90s, routers were limited to 10Gb/s of pin bandwidth. Today 1Tb/s is feasible, and we expect 20Tb/s of I/O bandwidth by 2010. A high-radix router that provides many narrow ports is more effective in converting pin bandwidth to reduced latency and reduced cost than the alternative of building a router with a few wide ports. However, increasing the radix (or degree) of a router raises several challenges as internal switches and allocators scale as the square of the radix. This paper addresses these challenges by proposing and evaluating alternative microarchitectures for high radix routers. We show that the use of a hierarchical switch organization with per-virtual-channel buffers in each subswitch enables an area savings of 40% compared to a fully buffered crossbar and a throughput increase of 20-60% compared to a conventional crossbar implementation.