Tradeoff between data-, instruction-, and thread-level parallelism in stream processors

  • Authors:
  • Jung Ho Ahn;Mattan Erez;William J. Dally

  • Affiliations:
  • Hewlett-Packard Laboratories, Palo Alto, California;University of Texas at Austin, Austin, Texas;Stanford University, Stanford, California

  • Venue:
  • Proceedings of the 21st annual international conference on Supercomputing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper explores the scalability of the Stream Processor architecture along the instruction-, data-, and thread-level parallelism dimensions. We develop detailed VLSI-cost and processor-performance models for a multi-threaded Stream Processor and evaluate the tradeoffs, in both functionality and hardware costs, of mechanisms that exploit the different types of parallelism. We show that the hardware overhead of supporting coarse-grained independent threads of control is 15 -- 86% depending on machine parameters. We also demonstrate that the performance gains provided are of a smaller magnitude for a set of numerical applications. We argue that for stream applications with scalable parallel algorithms the performance is not very sensitive to the control structures used within a large range of area-efficient architectural choices. We evaluate the specific effects on performance of scaling along the different parallelism dimensions and explain the limitations of the ILP, DLP, and TLP hardware mechanisms.