Domain-specific programmable design of scalable streaming-array for power-efficient stencil computation

  • Authors:
  • Kentaro Sano;Satoru Yamamoto;Yoshiaki Hatsuda

  • Affiliations:
  • Sciences, Tohoku University;Sciences, Tohoku University;Kobo, Co., Ltd.

  • Venue:
  • ACM SIGARCH Computer Architecture News
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the domain-specific programmable design of custom computing machines for high-performance stencil computation. Stencil computation is one of the typical kernels in scientific computations, however its low operational-intensity makes the sustained performance limited by memory bandwidth on recent microprocessors and GPUs. So far we have proposed a scalable streaming-array (SSA) of processing elements, which provides almost linear scalability by increasing FPGAs with a constant externalmemory bandwidth. In order to facilitate custom computing and efficiently utilize hardware resources for various and complex stencil-computations, we design programmable SSA with limited but necessary functionality. We show the design concept, the programmable structure and the SIMD instruction set for SSA. Prototype implementation with nine FPGAs demonstrates that our programmable design with a lot of floating-point units exploits hardware resources well, efficiently achieving 260 GFlop/s, which is 87.4% of the peak, at 1295 MFlop/sW.