High Throughput Data Mapping for Coarse-Grained Reconfigurable Architectures

  • Authors:
  • Yongjoo Kim;Jongeun Lee;Aviral Shrivastava;Jonghee W. Yoon;Doosan Cho;Yunheung Paek

  • Affiliations:
  • Department of Electrical Engineering and Computer Science, Seoul National University, Seoul, Korea;School of Electrical and Computer Engineering, Ulsan National Institute of Science and Technology, Ulsan, Korea;Department of Computer Science and Engineering, School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Phoenix, AZ, USA;Department of Electrical Engineering and Computer Science, Seoul National University, Seoul, Korea;Department of Electronic Engineering, Sunchon National University, Sunchon, Korea;Department of Electrical Engineering and Computer Science, Seoul National University, Seoul, Korea

  • Venue:
  • IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.03

Visualization

Abstract

Coarse-grained reconfigurable arrays (CGRAs) are a very promising platform, providing both up to 10–100 MOps/mW of power efficiency and software programmability. However, this promise of CGRAs critically hinges on the effectiveness of application mapping onto CGRA platforms. While previous solutions have greatly improved the computation speed, they have largely ignored the impact of the local memory architecture on the achievable power and performance. This paper motivates the need for memory-aware application mapping for CGRAs, and proposes an effective solution for application mapping that considers the effects of various memory architecture parameters including the number of banks, local memory size, and the communication bandwidth between the local memory and the external main memory. Further we propose efficient methods to handle dependent data on a double-buffering local memory, which is necessary for recurrent loops. Our proposed solution achieves 59% reduction in the energy-delay product, which factors into about 47% and 22% reduction in the energy consumption and runtime, respectively, as compared to memory-unaware mapping for realistic local memory architectures. We also show that our scheme scales across a range of applications and memory parameters, and the runtime overhead of handling recurrent loops by our proposed methods can be less than 1%.