Exploring the VLSI Scalability of Stream Processors

  • Authors:
  • Brucek Khailany;William J. Dally;Scott Rixner;Ujval J. Kapasi;John D. Owens;Brian Towles

  • Affiliations:
  • -;-;-;-;-;-

  • Venue:
  • HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

Stream processors are high-performance programmable processors optimized to run media applications. Recent work has shown these processors to be more area- and energy-efficient than conventional programmable architectures. This paper explores the scalability of stream architectures to future VLSI technologies where over a thousand floating-point units on a single chip will be feasible. Two techniques for increasing the number of ALUs in a streamprocessor are presented: intracluster and intercluster scaling. These scaling techniques are shown to be cost-efficient to tens of ALUs per cluster and to hundreds of arithmetic clusters. A 640-ALU stream processor with 128 clusters and 5 ALUs per cluster is shown to be feasible in 45 nanometer technology, sustaining over 300 GOPS on kernels and providing 15.3x of kernel speedup and 8.0x of application speedup over a 40-ALU stream processor with a 2% degradation in area per ALU and a 7% degradation in energy dissipated per ALU operation.