SIGGRAPH '86 Proceedings of the 13th annual conference on Computer graphics and interactive techniques
Using caching and breadth-first search to speed up ray-tracing
Proceedings on Graphics Interface '86/Vision Interface '86
Rendering complex scenes with memory-coherent ray tracing
Proceedings of the 24th annual conference on Computer graphics and interactive techniques
I3D '99 Proceedings of the 1999 symposium on Interactive 3D graphics
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
An improved illumination model for shaded display
Communications of the ACM
SaarCOR: a hardware architecture for ray tracing
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Breadth-First Ray Tracing Utilizing Uniform Spatial Subdivision
IEEE Transactions on Visualization and Computer Graphics
Imagine: Media Processing with Streams
IEEE Micro
A subdivision algorithm for computer display of curved surfaces.
A subdivision algorithm for computer display of curved surfaces.
A low power architecture for embedded perception
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
RPU: a programmable ray processing unit for realtime ray tracing
ACM SIGGRAPH 2005 Papers
Multi-level ray tracing algorithm
ACM SIGGRAPH 2005 Papers
Ray tracing deformable scenes using dynamic bounding volume hierarchies
ACM Transactions on Graphics (TOG)
Packet-based whitted and distribution ray tracing
GI '07 Proceedings of Graphics Interface 2007
Faster ray packets - triangle intersection through vertex culling
ACM SIGGRAPH 2007 posters
Application driven embedded system design: a face recognition case study
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
RT '07 Proceedings of the 2007 IEEE Symposium on Interactive Ray Tracing
Dynamic Ray Scheduling to Improve Ray Coherence and Bandwidth Utilization
RT '07 Proceedings of the 2007 IEEE Symposium on Interactive Ray Tracing
Architecture considerations for tracing incoherent rays
Proceedings of the Conference on High Performance Graphics
Efficient stack-less BVH traversal for ray tracing
Proceedings of the 27th Spring Conference on Computer Graphics
An energy and bandwidth efficient ray tracing architecture
Proceedings of the 5th High-Performance Graphics Conference
Hi-index | 0.00 |
The wide availability of commodity graphics processors has made real-time graphics an intrinsic component of the human/computer interface. These graphics cores accelerate the z-buffer algorithm and provide a highly interactive experience at a relatively low cost. However, many applications in entertainment, science, and industry require high quality lighting effects such as accurate shadows, reflection, and refraction. These effects can be difficult to achieve with z-buffer algorithms but are straightforward to implement using ray tracing. Although ray tracing is computationally more complex, the algorithm exhibits excellent scaling and parallelism properties. Nevertheless, ray tracing memory access patterns are difficult to predict and the parallelism speedup promise is therefore hard to achieve. This paper highlights a novel approach to ray tracing based on stream filtering and presents StreamRay, a multicore wide SIMD microarchitecture that delivers interactive frame rates of 15-32 frames/second for scenes of high geometric complexity and exhibits high utilization for SIMD widths ranging from eight to 16 elements. StreamRay consists of two main components: the ray engine, which is responsible for stream assembly and employs address generation units that generate addresses to form large SIMD vectors, and the filter engine, which implements the ray tracing operations with programmable accelerators. Results demonstrate that separating address and data processing reduces data movement and resource contention. Performance improves by 56% while simultaneously providing 11.63% power savings per accelerator core compared to a design which does not use separate resources for address and data computations.