Realtime ray tracing of dynamic scenes on an FPGA chip
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
RPU: a programmable ray processing unit for realtime ray tracing
ACM SIGGRAPH 2005 Papers
B-KD trees for hardware accelerated ray tracing of dynamic scenes
GH '06 Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
On fast Construction of SAH-based Bounding Volume Hierarchies
RT '07 Proceedings of the 2007 IEEE Symposium on Interactive Ray Tracing
Spatial splits in bounding volume hierarchies
Proceedings of the Conference on High Performance Graphics 2009
TRaX: a multicore hardware architecture for real-time ray tracing
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Conservation cores: reducing the energy of mature computations
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
OptiX: a general purpose ray tracing engine
ACM SIGGRAPH 2010 papers
HLBVH: hierarchical LBVH construction for real-time ray tracing of dynamic geometry
Proceedings of the Conference on High Performance Graphics
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
IEEE Micro
Dark silicon and the end of multicore scaling
Proceedings of the 38th annual international symposium on Computer architecture
Simpler and faster HLBVH with work queues
Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics
T&I engine: traversal and intersection engine for hardware accelerated ray tracing
Proceedings of the 2011 SIGGRAPH Asia Conference
Power, Programmability, and Granularity: The Challenges of ExaScale Computing
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Fast Construction of SAH BVHs on the Intel Many Integrated Core (MIC) Architecture
IEEE Transactions on Visualization and Computer Graphics
Fast, effective BVH updates for animated scenes
I3D '12 Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games
I3D '12 Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games
SGRT: a scalable mobile GPU architecture based on ray tracing
ACM SIGGRAPH 2012 Posters
Interactive global photon mapping
EGSR'09 Proceedings of the Twentieth Eurographics conference on Rendering
Maximizing parallelism in the construction of BVHs, octrees, and k-d trees
EGGH-HPG'12 Proceedings of the Fourth ACM SIGGRAPH / Eurographics conference on High-Performance Graphics
Power efficiency for software algorithms running on graphics processors
EGGH-HPG'12 Proceedings of the Fourth ACM SIGGRAPH / Eurographics conference on High-Performance Graphics
Hi-index | 0.00 |
Ray-tracing algorithms are known for producing highly realistic images, but at a significant computational cost. For this reason, a large body of research exists on various techniques for accelerating these costly algorithms. One approach to achieving superior performance which has received comparatively little attention is the design of specialised ray-tracing hardware. The research that does exist on this topic has consistently demonstrated that significant performance and efficiency gains can be achieved with dedicated microarchitectures. However, previous work on hardware ray-tracing has focused almost entirely on the traversal and intersection aspects of the pipeline. As a result, the critical aspect of the management and construction of acceleration data-structures remains largely absent from the hardware literature. We propose that a specialised microarchitecture for this purpose could achieve considerable performance and efficiency improvements over programmable platforms. To this end, we have developed the first dedicated microarchitecture for the construction of binned SAH BVHs. Cycle-accurate simulations show that our design achieves significant improvements in raw performance and in the bandwidth required for construction, as well as large efficiency gains in terms of performance per clock and die area compared to manycore implementations. We conclude that such a design would be useful in the context of a heterogeneous graphics processor, and may help future graphics processor designs to reduce predicted technology-imposed utilisation limits.