On-chip network design considerations for compute accelerators

  • Authors:
  • Ali Bakhoda;John Kim;Tor M. Aamodt

  • Affiliations:
  • University of British Columbia, Canada, BC, Canada;KAIST, Daejeon, South Korea;University of British Columbia, Vancouver, BC, Canada

  • Venue:
  • Proceedings of the 19th international conference on Parallel architectures and compilation techniques
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

There has been little work investigating the overall performance impact of on-chip communication in manycore compute accelerators. In this paper we evaluate performance of a GPU-like compute accelerator running CUDA workloads and consisting of compute nodes, interconnection network and the graphics DRAM memory system using detailed cycle-level simulation. First, we study performance of a baseline architecture employing a scalable mesh network. We then propose several microarchitectural techniques to exploit the communication characteristics of these applications while providing a cost-effective (i.e., low area) on-chip network. Instead of increasing costly bisection bandwidth, we increase the the number of injection ports at the memory controller router nodes to increase terminal bandwidth at the few nodes. In addition, we propose a novel "checkerboard" on-chip network which alternates between conventional, full-routers and half-routers with limited connectivity. This network is enabled by limited communication of the many-to-few traffic pattern. We describe a minimal routing algorithm for the checkerboard network that does not increase the hop count.