Memory coherence in shared virtual memory systems
PODC '86 Proceedings of the fifth annual ACM symposium on Principles of distributed computing
The DASH prototype: implementation and performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Volume rendering on scalable shared-memory MIMD architectures
VVS '92 Proceedings of the 1992 workshop on Volume visualization
Working sets, cache sizes, and node granularity issues for large-scale multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Real-time volume rendering on shared memory multiprocessors using the shear-warp factorization
PRS '95 Proceedings of the IEEE symposium on Parallel rendering
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Scope consistency: a bridge between release consistency and entry consistency
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Fast volume rendering using a shear-warp factorization of the viewing transformation
Fast volume rendering using a shear-warp factorization of the viewing transformation
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Performance analysis using the MIPS R10000 performance counters
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
A methodology and an evaluation of the SGI Origin2000
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Scaling application performance on a cache-coherent multiprocessor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Hi-index | 0.00 |
This paper presents a new parallel volume rendering algorithm and implementation, based on shear warp factorization, for shared address space multiprocessors. Starting from an existing parallel shear-warp renderer, we use increasingly detailed performance measurements on real machines and simulators to understand performance bottlenecks. This leads us to a new parallel implementation that substantially outperforms and out-scales the old one on a range of shared address space platforms, from bus-based centralized memory machine to hardware-coherent distributed memory machines to networks of computers connected by page-based shared virtual memory. The results demonstrate that real time volume rendering is promising on general purpose multiprocessors, and illustrate the utility of tool hierarchies in conjunction with algorithmic and application knowledge to understand memory system interactions and improve parallel algorithms.