StreamRay: a stream filtering architecture for coherent ray tracing

Authors:
Karthik Ramani;Christiaan P. Gribble;Al Davis
Affiliations:
University of Utah, Salt Lake City, UT, USA;Grove City College, Grove City, PA, USA;University of Utah, Salt Lake City, UT, USA
Venue:
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Year:
2009

Citing 21
Cited 3

The rendering equation

SIGGRAPH '86 Proceedings of the 13th annual conference on Computer graphics and interactive techniques
Using caching and breadth-first search to speed up ray-tracing

Proceedings on Graphics Interface '86/Vision Interface '86
Rendering complex scenes with memory-coherent ray tracing

Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Interactive ray tracing

I3D '99 Proceedings of the 1999 symposium on Interactive 3D graphics
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
An improved illumination model for shaded display

Communications of the ACM
SaarCOR: a hardware architecture for ray tracing

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Breadth-First Ray Tracing Utilizing Uniform Spatial Subdivision

IEEE Transactions on Visualization and Computer Graphics
Imagine: Media Processing with Streams

IEEE Micro
A subdivision algorithm for computer display of curved surfaces.

A subdivision algorithm for computer display of curved surfaces.
A low power architecture for embedded perception

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Merrimac: Supercomputing with Streams

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
RPU: a programmable ray processing unit for realtime ray tracing

ACM SIGGRAPH 2005 Papers
Multi-level ray tracing algorithm

ACM SIGGRAPH 2005 Papers
Ray tracing deformable scenes using dynamic bounding volume hierarchies

ACM Transactions on Graphics (TOG)
Packet-based whitted and distribution ray tracing

GI '07 Proceedings of Graphics Interface 2007
Faster ray packets - triangle intersection through vertex culling

ACM SIGGRAPH 2007 posters
Application driven embedded system design: a face recognition case study

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
Deep Coherent Ray Tracing

RT '07 Proceedings of the 2007 IEEE Symposium on Interactive Ray Tracing
Dynamic Ray Scheduling to Improve Ray Coherence and Bandwidth Utilization

RT '07 Proceedings of the 2007 IEEE Symposium on Interactive Ray Tracing

Architecture considerations for tracing incoherent rays

Proceedings of the Conference on High Performance Graphics
Efficient stack-less BVH traversal for ray tracing

Proceedings of the 27th Spring Conference on Computer Graphics
An energy and bandwidth efficient ray tracing architecture

Proceedings of the 5th High-Performance Graphics Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

The wide availability of commodity graphics processors has made real-time graphics an intrinsic component of the human/computer interface. These graphics cores accelerate the z-buffer algorithm and provide a highly interactive experience at a relatively low cost. However, many applications in entertainment, science, and industry require high quality lighting effects such as accurate shadows, reflection, and refraction. These effects can be difficult to achieve with z-buffer algorithms but are straightforward to implement using ray tracing. Although ray tracing is computationally more complex, the algorithm exhibits excellent scaling and parallelism properties. Nevertheless, ray tracing memory access patterns are difficult to predict and the parallelism speedup promise is therefore hard to achieve. This paper highlights a novel approach to ray tracing based on stream filtering and presents StreamRay, a multicore wide SIMD microarchitecture that delivers interactive frame rates of 15-32 frames/second for scenes of high geometric complexity and exhibits high utilization for SIMD widths ranging from eight to 16 elements. StreamRay consists of two main components: the ray engine, which is responsible for stream assembly and employs address generation units that generate addresses to form large SIMD vectors, and the filter engine, which implements the ray tracing operations with programmable accelerators. Results demonstrate that separating address and data processing reduces data movement and resource contention. Performance improves by 56% while simultaneously providing 11.63% power savings per accelerator core compared to a design which does not use separate resources for address and data computations.