GPU-optimized volume ray tracing for massive numbers of rays in radiotherapy

  • Authors:
  • Bo Zhou;Kai Xiao;Danny Z. Chen;X. Sharon Hu

  • Affiliations:
  • University of Maryland and Fudan University, Shanghai, China;University of Notre Dame;University of Notre Dame;University of Notre Dame

  • Venue:
  • ACM Transactions on Embedded Computing Systems (TECS)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Ray tracing within a uniform grid volume is a fundamental process invoked frequently by many applications, especially radiation-dose calculation methods in radiotherapy. However, the conflicting features between the GPU memory architecture and the memory-accessing patterns of volume ray tracing lead to inefficient usage of GPU memory bandwidth and waste of capability of modern GPUs. To improve the ray tracing performance on GPU, we propose a lookup-table-based ray tracing method which is specially optimized towards the GPU memory system for processing a massive number of rays. The proposed method is based on a key observation that many of these applications normally involves a massive number of rays, but their ray tracing may not need to follow a specific execution order. Therefore, we divide the 3D space into many regions (called pyramids) and group together the rays falling into the same pyramid. For each ray group, the volume is rotated and resampled for their raytracing. This divide-and-rotate strategy allows the memory access of the ray tracing process to adopt a table-lookup approach and leads to better memory coalescing on GPU. Our proposed method was thoroughly evaluated in four volume setups with randomly-generated rays. The collapsed-cone convolution/superposition (CCCS) dose calculation method is also implemented with/without the proposed approach to verify the feasibility of our method. Compared with the direct GPU implementation of the popular 3DDDA algorithm, our method provides a speedup in the range of 1.91--2.94X for the volume settings we used. Major performance factors, including ray origins, volume size, and pyramid size, are also analyzed. The proposed technique was also found to be able to give a speedup of 1.61--2.17X over the original GPU implementation of the CCCS algorithm. Our experiment results indicate that the proposed approach is capable of offering better coalesced memory access which eventually boosts the raytracing performance on GPU. Moreover, our approach is conceptually simple and can be readily included into various applications.