Display of Surfaces from Volume Data
IEEE Computer Graphics and Applications
Computer graphics: principles and practice (2nd ed.)
Computer graphics: principles and practice (2nd ed.)
Volume rendering on scalable shared-memory MIMD architectures
VVS '92 Proceedings of the 1992 workshop on Volume visualization
Segmented ray casting for data parallel volume rendering
PRS '93 Proceedings of the 1993 symposium on Parallel rendering
A data distributed, parallel algorithm for ray-traced volume rendering
PRS '93 Proceedings of the 1993 symposium on Parallel rendering
Parallel volume ray-casting for unstructured-grid data on distributed-memory architectures
PRS '95 Proceedings of the IEEE symposium on Parallel rendering
Multi-frame thrashless ray casting with advancing ray-front
GI '96 Proceedings of the conference on Graphics interface '96
Programming with POSIX threads
Programming with POSIX threads
A rendering algorithm for visualizing 3D scalar fields
SIGGRAPH '88 Proceedings of the 15th annual conference on Computer graphics and interactive techniques
V-buffer: visible volume rendering
SIGGRAPH '88 Proceedings of the 15th annual conference on Computer graphics and interactive techniques
SIGGRAPH '88 Proceedings of the 15th annual conference on Computer graphics and interactive techniques
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
The Visual Computer: International Journal of Computer Graphics
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Large data visualization on distributed memory multi-GPU clusters
Proceedings of the Conference on High Performance Graphics
Streamline Integration Using MPI-Hybrid Parallelism on a Large Multicore Architecture
IEEE Transactions on Visualization and Computer Graphics
Multi-GPU MapReduce on GPU Clusters
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Hybrid Parallelism for Volume Rendering on Large-, Multi-, and Many-Core Systems
IEEE Transactions on Visualization and Computer Graphics
Optimized volume raycasting for graphics-hardware-based cluster systems
EG PGV'06 Proceedings of the 6th Eurographics conference on Parallel Graphics and Visualization
A simple and flexible volume rendering framework for graphics-hardware-based raycasting
VG'05 Proceedings of the Fourth Eurographics / IEEE VGTC conference on Volume Graphics
Geometry-preserving topological landscapes
Proceedings of the Workshop at SIGGRAPH Asia
Hi-index | 0.00 |
Given the computing industry trend of increasing processing capacity by adding more cores to a chip, the focus of this work is tuning the performance of a staple visualization algorithm, raycasting volume rendering, for shared-memory parallelism on multi-core CPUs and many-core GPUs. Our approach is to vary tunable algorithmic settings, along with known algorithmic optimizations and two different memory layouts, and measure performance in terms of absolute runtime and L2 memory cache misses. Our results indicate there is a wide variation in runtime performance on all platforms, as much as 254% for the tunable parameters we test on multi-core CPUs and 265% on many-core GPUs, and the optimal configurations vary across platforms, often in a non-obvious way. For example, our results indicate the optimal configurations on the GPU occur at a crossover point between those that maintain good cache utilization and those that saturate computational throughput. This result is likely to be extremely difficult to predict with an empirical performance model for this particular algorithm because it has an unstructured memory access pattern that varies locally for individual rays and globally for the selected viewpoint. Our results also show that optimal parameters on modern architectures are markedly different from those in previous studies run on older architectures. In addition, given the dramatic performance variation across platforms for both optimal algorithm settings and performance results, there is a clear benefit for production visualization and analysis codes to adopt a strategy for performance optimization through auto-tuning. These benefits will likely become more pronounced in the future as the number of cores per chip and the cost of moving data through the memory hierarchy both increase.