Active thread compaction for GPU path tracing

Authors:
Ingo Wald
Affiliations:
Intel
Venue:
Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics
Year:
2011

Citing 7
Cited 4

Physically Based Rendering: From Theory to Implementation

Physically Based Rendering: From Theory to Implementation
Understanding the efficiency of ray traversal on GPUs

Proceedings of the Conference on High Performance Graphics 2009
Faster incoherent rays: Multi-BVH ray stream tracing

Proceedings of the Conference on High Performance Graphics 2009
Stream compaction for deferred shading

Proceedings of the Conference on High Performance Graphics 2009
AnySL: efficient and portable shading for ray tracing

Proceedings of the Conference on High Performance Graphics
Architecture considerations for tracing incoherent rays

Proceedings of the Conference on High Performance Graphics
Improving SIMD efficiency for parallel Monte Carlo light transport on the GPU

Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics

SIMD divergence optimization through intra-warp compaction

Proceedings of the 40th Annual International Symposium on Computer Architecture
Megakernels considered harmful: wavefront path tracing on GPUs

Proceedings of the 5th High-Performance Graphics Conference
Parallel processing of intersections for ray-tracing in application-specific processors and GPGPUs

Microprocessors & Microsystems
Mining effective parallelism from hidden coherence for GPU based path tracing

SIGGRAPH Asia 2013 Technical Briefs

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern GPUs like NVidia's Fermi internally operate in a SIMD manner by ganging multiple (32) scalar threads together into SIMD warps; if a warp's threads diverge, the warp serially executes both branches, temporarily disabling threads that are not on that path. In this paper, we explore and thoroughly analyze the concept of active thread compaction---i.e., the process of taking multiple partially-filled warps and compacting them to fewer but fully utilized warps---in the context of a CUDA path tracer. Our results show that this technique can indeed lead to significant improvements in SIMD utilization, and corresponding savings in the amount of work performed; however, they also show that certain inadequacies of today's hardware wipe out most of the achieved gains, leaving bottom-up speed-ups of a mere 12--16%. We believe our analysis of why this is the case will provide insight to other researchers experimenting with this technique in different contexts.