StreamRay: a stream filtering architecture for coherent ray tracing
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Cache-oblivious ray reordering
ACM Transactions on Graphics (TOG)
Two-level ray tracing with reordering for highly complex scenes
Proceedings of Graphics Interface 2010
Architecture considerations for tracing incoherent rays
Proceedings of the Conference on High Performance Graphics
Ray tracing visualization toolkit
I3D '12 Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games
Automatically enhancing locality for tree traversals with traversal splicing
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Improving Data Locality for Efficient In-Core Path Tracing
Computer Graphics Forum
Efficient data management for incoherent ray tracing
Applied Soft Computing
Efficient stack-less BVH traversal for ray tracing
Proceedings of the 27th Spring Conference on Computer Graphics
Sorted deferred shading for production path tracing
EGSR '13 Proceedings of the Eurographics Symposium on Rendering
Hi-index | 0.00 |
The performance of full-featured ray tracers has historically been limited by the hardware's floating point computational power. However, next generation multi-threaded multi-core architectures promise to provide sufficient CPU throughput to support real time frame rates. In such systems, limited memory system performance in terms of both on-chip cache and DRAM-to-cache bandwidth is likely to bound overall system performance. This paper presents a novel ray tracing algorithm that both improves cache utilization and reduces DRAM-to-cache bandwidth usage. The key insight is to view ray traversal as a scheduling problem, which allows our algorithm to match ray traversal computations and intersection computations with available system resources. Using a detailed simulator, we show that our algorithm significantly reduces the amount of data brought into the cache in exchange for the small overhead of maintaining the ray schedule. Moreover, our algorithm creates units of work that are more amenable to parallelization than traditional Whitted-style ray tracers.