Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping

  • Authors:
  • Amir Hormati;Nathan Clark;Scott Mahlke

  • Affiliations:
  • University of Michigan - Ann Arbor;University of Michigan - Ann Arbor;University of Michigan - Ann Arbor

  • Venue:
  • Proceedings of the International Symposium on Code Generation and Optimization
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The demand for high performance has driven acyclic computation accelerators into extensive use in modern embedded and desktop architectures. Accelerators that are ideal from a software perspective, are difficult or impossible to integrate in many modern architectures, though, due to area and timing requirements. This reality is coupled with the observation that many application domains under-utilize accelerator hardware, because of the narrow data they operate on and the nature of their computation. In this work, we take advantage of these facts to design accelerators capable of executing in modern architectures by narrowing datapath width and reducing interconnect. Novel compiler techniques are developed in order to generate highquality code for the reduced-cost accelerators and prevent performance loss to the extent possible. First, data width profiling is used to statistically determine how wide program data will be at run time. This information is used by the subgraph mapping algorithm to optimally select subgraphs for execution on targeted narrow accelerators. Overall, our data-centric compilation techniques achieve on average 6.5%, and up to 12%, speed up over previous subgraph mapping algorithms for 8-bit accelerators. We also show that, with appropriate compiler support, the increase in the total number of execution cycles in reduced-interconnect accelerators is less than 1% of the fully-connected accelerator.