Conditional techniques for stream processing kernels

  • Authors:
  • William J. Dally;Ujval J. Kapasi

  • Affiliations:
  • -;-

  • Venue:
  • Conditional techniques for stream processing kernels
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The stream programming model casts applications as a set of sequential data streams that are operated on by data-parallel computation kernels. Previous work has shown that this model is a powerful representation for media processing applications, such as image- and signal-processing, because it captures the locality and concurrency inherent to an application. This careful handling of important application properties results in kernels that are compute-intensive—i.e., kernels that perform a large number of arithmetic operations per unit of inter-kernel communication bandwidth. Furthermore, the stream model can be implemented with efficient VLSI designs, such as the Imagine Programmable Stream Processor. The Imagine chip supports 48 ALUs on a single die, operating at over 200MHz. This large number of ALUs provides a high peak performance, but makes efficiently executing kernels with conditional code a challenge. We will introduce two techniques for efficiently supporting kernel conditionals, such as if-statements, case-statements, and while-loops: conditional routing and conditional streams. Conditional routing is a code transformation that exploits the trade-off of increasing inter-kernel communication in order to increase the performance of kernel inner-loops containing conditional code. The second technique we will discuss is the use of a conditional stream, which is a mechanism to reduce the amount of load-imbalance that arises between parallel processing clusters on a single stream processor chip. Load-imbalance results when different conditional paths are taken on different processing clusters, and causes one or more of the clusters to wait idle for the others to complete a kernel. We will also present a case study of the impact of these techniques on a programmable polygon rendering pipeline that contains many unpredictable conditionals. We show that our techniques improve the performance of this application by 1.9×.