StreamRay: a stream filtering architecture for coherent ray tracing

  • Authors:
  • Karthik Ramani;Christiaan P. Gribble;Al Davis

  • Affiliations:
  • University of Utah, Salt Lake City, UT, USA;Grove City College, Grove City, PA, USA;University of Utah, Salt Lake City, UT, USA

  • Venue:
  • Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The wide availability of commodity graphics processors has made real-time graphics an intrinsic component of the human/computer interface. These graphics cores accelerate the z-buffer algorithm and provide a highly interactive experience at a relatively low cost. However, many applications in entertainment, science, and industry require high quality lighting effects such as accurate shadows, reflection, and refraction. These effects can be difficult to achieve with z-buffer algorithms but are straightforward to implement using ray tracing. Although ray tracing is computationally more complex, the algorithm exhibits excellent scaling and parallelism properties. Nevertheless, ray tracing memory access patterns are difficult to predict and the parallelism speedup promise is therefore hard to achieve. This paper highlights a novel approach to ray tracing based on stream filtering and presents StreamRay, a multicore wide SIMD microarchitecture that delivers interactive frame rates of 15-32 frames/second for scenes of high geometric complexity and exhibits high utilization for SIMD widths ranging from eight to 16 elements. StreamRay consists of two main components: the ray engine, which is responsible for stream assembly and employs address generation units that generate addresses to form large SIMD vectors, and the filter engine, which implements the ray tracing operations with programmable accelerators. Results demonstrate that separating address and data processing reduces data movement and resource contention. Performance improves by 56% while simultaneously providing 11.63% power savings per accelerator core compared to a design which does not use separate resources for address and data computations.