Performance Analysis of k-ary n-cube Interconnection Networks
IEEE Transactions on Computers
Logical effort: designing fast CMOS circuits
Logical effort: designing fast CMOS circuits
The iSLIP scheduling algorithm for input-queued switches
IEEE/ACM Transactions on Networking (TON)
Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
Interconnection Networks: An Engineering Approach
Interconnection Networks: An Engineering Approach
Limits on Interconnection Network Performance
IEEE Transactions on Parallel and Distributed Systems
Crossbar Analysis for Optimal Deadlock Recovery Router Architecture
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A Power Model for Routers: Modeling Alpha 21364 and InfiniBand Routers
HOTI '02 Proceedings of the 10th Symposium on High Performance Interconnects HOT Interconnects
Design of High-Speed Serial Links in CMOS
Design of High-Speed Serial Links in CMOS
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
A low latency router supporting adaptivity for on-chip interconnects
Proceedings of the 42nd annual Design Automation Conference
Microarchitecture of a High-Radix Router
Proceedings of the 32nd annual international symposium on Computer Architecture
New Generation of Predictive Technology Model for Sub-45nm Design Exploration
ISQED '06 Proceedings of the 7th International Symposium on Quality Electronic Design
A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks
Proceedings of the 33rd annual international symposium on Computer Architecture
The BlackWidow High-Radix Clos Network
Proceedings of the 33rd annual international symposium on Computer Architecture
Design tradeoffs for tiled CMP on-chip networks
Proceedings of the 20th annual international conference on Supercomputing
Towards an efficient switch architecture for high-radix switches
Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems
Adaptive routing in high-radix clos network
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Computer Networks, Fourth Edition: A Systems Approach
Computer Networks, Fourth Edition: A Systems Approach
Rotary router: an efficient architecture for CMP interconnection networks
Proceedings of the 34th annual international symposium on Computer architecture
Flattened butterfly: a cost-efficient topology for high-radix networks
Proceedings of the 34th annual international symposium on Computer architecture
Flattened Butterfly Topology for On-Chip Networks
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
The Cray BlackWidow: a highly scalable vector multiprocessor
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Technology-Driven, Highly-Scalable Dragonfly Topology
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
High-radix crossbar switches enabled by proximity communication
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
QsNetIII an Adaptively Routed Network for High Performance Computing
HOTI '08 Proceedings of the 2008 16th IEEE Symposium on High Performance Interconnects
Indirect adaptive routing on large scale interconnection networks
Proceedings of the 36th annual international symposium on Computer architecture
Prediction of high-performance on-chip global interconnection
Proceedings of the 11th international workshop on System level interconnect prediction
Silicon-photonic clos networks for global on-chip communication
NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
HyperX: topology, routing, and packaging of efficient large-scale networks
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Energy proportional datacenter networks
Proceedings of the 37th annual international symposium on Computer architecture
The PERCS High-Performance Interconnect
HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
The role of optics in future high radix switch design
Proceedings of the 38th annual international symposium on Computer architecture
CHIPPER: A low-complexity bufferless deflection router
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Network within a network approach to create a scalable high-radix router microarchitecture
HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
Hi-index | 0.00 |
As the system size of supercomputers and datacenters increases, cost-efficient networks become critical in achieving good scalability on those systems. High-radix routers reduce network cost by lowering the network diameter while providing a high bisection bandwidth and path diversity. The building blocks of these large-scale networks are the routers or the switches and they need to scale accordingly to the increasing port count and increasing pin bandwidth. However, as the port count increases, the high-radix router microarchitecture itself needs to scale efficiently. Hierarchical crossbar switch organization has been proposed where a single large crossbar used for a router switch is partitioned into many small crossbars and overcomes the limitations of conventional router microarchitecture. Although the organization provides high performance, it has limited scalability due to excessive power and area overheads by the wires and intermediate buffers. In this article, we propose scalable router microarchitectures that leverage a network within the switch design of the high-radix routers themselves. These alternative designs lower the wiring complexity and buffer requirements. For example, when a folded-Clos switch is used instead of the hierarchical crossbar switch for a radix-64 router, it provides up to 73%, 58%, and 87% reduction in area, energy-delay product, and energy-delay-area product, respectively. We also explore more efficient switch designs by exploiting the traffic-pattern characteristics of the global network and its impact on the local network design within the switch for both folded-Clos and flattened butterfly networks. In particular, we propose a bilateral butterfly switch organization that has fewer crossbars and global wires compared to the topology-agnostic folded-Clos switch while achieving better low-load latency and equivalent saturation throughput.