Network within a network approach to create a scalable high-radix router microarchitecture

  • Authors:
  • Jung Ho Ahn;Sungwoo Choo;John Kim

  • Affiliations:
  • Dept. of Intelligent, Convergence Systems, Seoul National University;Dept. of Intelligent, Convergence Systems, Seoul National University;Dept. of Computer Science & Web Science Technology Division, KAIST

  • Venue:
  • HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cost-efficient networks are critical in creating scalable large-scale systems, including those found in supercomputers and datacenters. High-radix routers reduce network cost by lowering the network diameter while providing a high bisection bandwidth and path diversity. However, as the port count increases, the high-radix router microarchitecture needs to scale efficiently. Hierarchical crossbar organization has been proposed where a single large crossbar is partitioned into many small crossbars and overcomes the limitations of conventional switch microarchitecture. Although the organization provides high performance, its scalability is limited due to power and area overheads by the wires and intermediate buffers. We propose alternative scalable router microarchitectures that leverage a network within the switch design of the high-radix routers themselves. These designs lower the wiring complexity and buffer requirements. For example, when a folded-Clos switch is used instead of the hierarchical crossbar switch for a radix-64 router, it provides up to 73%, 58%, and 87% reduction in area, energy-delay product, and energy-delay-area product, respectively. We also explore more efficient switch designs by exploiting the traffic-pattern characteristics of the global network and its impact on the local network design within the switch. In particular, we propose a bilateral butterfly switch organization that has fewer crossbars and half the number of global wires compared to the topology-agnostic folded-Clos switch while achieving better low-load latency and equivalent saturation throughput.