Optimization of interconnects between accelerators and shared memories in dark silicon

  • Authors:
  • Jason Cong;Bingjun Xiao

  • Affiliations:
  • University of California, Los Angeles, California;University of California, Los Angeles, California

  • Venue:
  • Proceedings of the International Conference on Computer-Aided Design
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Application-specific accelerators provide orders-of-magnitude improvement in energy-efficiency over CPUs, and accelerator-rich computing platforms are showing promise in the dark silicon age. Memory sharing among accelerators leads to huge transistor savings, but needs novel designs of interconnects between accelerators and shared memories. Accelerators run 100x faster than CPUs and post a high demand on data. This leads to resource-consuming interconnects if we follow the same design rules as those for interconnects between CPUs and shared memories, and simply duplicate the interconnect hardware to meet the accelerator data demand. In this work we develop a novel design of interconnects between accelerators and shared memories and exploit three optimization opportunities that emerge in accelerator-rich computing platforms: 1) The multiple data ports of the same accelerators are powered on/off together, and the competition for shared resources among these ports can be eliminated to save interconnect transistor cost; 2) In dark silicon, the number of active accelerators in an accelerator-rich platform is usually limited, and the interconnects can be partially populated to just fit the data access demand limited by the power budget; 3) The heterogeneity of accelerators leads to execution patterns among accelerators and, based on the probability analysis to identify these patterns, interconnects can be optimized for the expected utilization. Experiments show that our interconnect design outperforms prior work that was optimized for CPU cores or signal routing.