A Sorting Classification of Parallel Rendering
IEEE Computer Graphics and Applications
Communication Costs for Parallel Volume-Rendering Algorithms
IEEE Computer Graphics and Applications
Parallel Volume Rendering Using Binary-Swap Compositing
IEEE Computer Graphics and Applications
SIGGRAPH '84 Proceedings of the 11th annual conference on Computer graphics and interactive techniques
SLIC: Scheduled Linear Image Compositing for Parallel Volume Rendering
PVG '03 Proceedings of the 2003 IEEE Symposium on Parallel and Large-Data Visualization and Graphics
Distributed texture memory in a multi-GPU environment
GH '06 Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Collective communication: theory, practice, and experience: Research Articles
Concurrency and Computation: Practice & Experience
A configurable algorithm for parallel image-compositing applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Quantifying NUMA and contention effects in multi-GPU systems
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
An image compositing solution at scale
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Multi-GPU sort-last volume visualization
EG PGV'08 Proceedings of the 8th Eurographics conference on Parallel Graphics and Visualization
Accelerating and benchmarking radix-k image compositing at large scale
EG PGV'10 Proceedings of the 10th Eurographics conference on Parallel Graphics and Visualization
Hi-index | 0.00 |
Sort-last parallel rendering is widely used. Recent GPU developments mean that a PC equipped with multiple GPUs is a viable alternative to a high-cost supercomputer: the Fermi architecture of a single GPU supports uniform virtual addressing, providing a foundation for non-uniform memory access (NUMA) on multi-GPU platforms. Such hardware changes require the user to reconsider the parallel rendering algorithms. In this paper, we thoroughly investigate the NUMA-aware image compositing problem, which is the key final stage in sort-last parallel rendering. Based on a proven radix-k strategy, we find one optimal compositing algorithm, which takes advantage of NUMA architecture on the multi-GPU platform. We quantitatively analyze different image compositing modes for practical image compositing, taking into account peer-to-peer communication costs between GPUs. Our experiments on various datasets show that our image compositing method is very fast, an image of a few megapixels can be composited in about 10 ms by eight GPUs.