Data and Computation Transformations for Brook Streaming Applications on Multiprocessors

  • Authors:
  • Shih-wei Liao;Zhaohui Du;Gansha Wu;Guei-Yuan Lueh

  • Affiliations:
  • Intel Corporation;Intel Corporation;Intel Corporation;Intel Corporation

  • Venue:
  • Proceedings of the International Symposium on Code Generation and Optimization
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multicore processors are about to become prevalent in the PC world. Meanwhile, over 90% of the computing cycles are estimated to be consumed by streaming media applications [24]. Although stream programming exposes parallelism naturally, we found that achieving high performance on multiprocessors is challenging. Therefore, we develop a parallel compiler for the Brook streaming language with aggressive data and computation transformations. First, we formulate fifteen Brook stream operators in terms of systems of inequalities. Our compiler optimizes the modeled operators to improve memory footprint and performance. Second, the stream computation including both kernels and operators is mapped to the affine partitioning model by modeling each kernel as an implicit loop nest over stream elements. Note that our general abstraction is not limited to Brook. Our modeling and transformations yield high performance on uniprocessors as well. The geometric mean of speedups is 4.7 on ten streaming applications on a Xeon. On multiprocessors, we show that exploiting the standard intra-kernel data parallelism is inferior to our general modeling. The former yields a speedup of 1.5 for ten applications on a 4-way Xeon, while the latter achieves a speedup of 6.4 over the same baseline. We show that our compiler effectively reduces memory footprint, exploits parallelism, and circumvents phase-ordering issues.