A rapid hierarchical radiosity algorithm
Proceedings of the 18th annual conference on Computer graphics and interactive techniques
Volume rendering on scalable shared-memory MIMD architectures
VVS '92 Proceedings of the 1992 workshop on Volume visualization
Working sets, cache sizes, and node granularity issues for large-scale multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Parallel hierarchical N-body methods and their implications for multiprocessors
Parallel hierarchical N-body methods and their implications for multiprocessors
Fast volume rendering using a shear-warp factorization of the viewing transformation
SIGGRAPH '94 Proceedings of the 21st annual conference on Computer graphics and interactive techniques
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A progressive refinement approach to fast radiosity image generation
SIGGRAPH '88 Proceedings of the 15th annual conference on Computer graphics and interactive techniques
Implications of hierarchical N-body methods for multiprocessor architectures
ACM Transactions on Computer Systems (TOCS)
Reducing false sharing on shared memory multiprocessors through compile time data transformations
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Real-time volume rendering on shared memory multiprocessors using the shear-warp factorization
PRS '95 Proceedings of the IEEE symposium on Parallel rendering
Efficient parallel global illumination using density estimation
PRS '95 Proceedings of the IEEE symposium on Parallel rendering
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
CRL: high-performance all-software distributed shared memory
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Coarse-grained parallelism for hierarchical radiosity using group iterative methods
SIGGRAPH '96 Proceedings of the 23rd annual conference on Computer graphics and interactive techniques
Towards efficiency and portability: programming with the BSP model
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
A performance evaluation of cluster architectures
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving parallel shear-warp volume rendering on shared address space multiprocessors
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploiting deep parallel memory hierarchies for ray casting volume rendering
PRS '97 Proceedings of the IEEE symposium on Parallel rendering
Parallel hierarchical computation of specular radiosity
PRS '97 Proceedings of the IEEE symposium on Parallel rendering
A parallel hierarchical radiosity algorithm for complex scenes
PRS '97 Proceedings of the IEEE symposium on Parallel rendering
Performance analysis on a CC-NUMA prototype
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Predicting the performance of distributed virtual shared-memory applications
IBM Systems Journal
I3D '99 Proceedings of the 1999 symposium on Interactive 3D graphics
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Scaling application performance on a cache-coherent multiprocessor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ICS '99 Proceedings of the 13th international conference on Supercomputing
Scheduling threads for low space requirement and good locality
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
On the partitionability of hierarchical radiosity
PVGS '99 Proceedings of the 1999 IEEE symposium on Parallel visualization and graphics
Proceedings of the ACM 2000 conference on Java Grande
Hybrid sort-first and sort-last parallel rendering with a cluster of PCs
HWWS '00 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
Accelerating shared virtual memory via general-purpose network interface support
ACM Transactions on Computer Systems (TOCS)
A comparison of three programming models for adaptive applications on the Origin2000
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Towards a first vertical prototyping of an extremely fine-grained parallel programming approach
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
A hierarchical load-balancing framework for dynamic multithreaded computations
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Pthreads for dynamic and irregular parallelism
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
WOSP '02 Proceedings of the 3rd international workshop on Software and performance
A comparison of three programming models for adaptive applications on the origin2000
Journal of Parallel and Distributed Computing
International Journal of Parallel Programming
Analysis of a Parallel Volume Rendering System Based on the Shear-Warp Factorization
IEEE Transactions on Visualization and Computer Graphics
A Perceptually-Driven Parallel Algorithm for Efficient Radiosity Simulation
IEEE Transactions on Visualization and Computer Graphics
PVR: High-Performance Volume Rendering
IEEE Computational Science & Engineering
A Fully Compliant OpenMP Implementationon Software Distributed Shared Memory
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
A Hierarchical Parallel Processing System for the Multipass-Rendering Method
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
View Caching: Efficient Software Shared Memory for Dynamic Computations
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Message Passing Vs. Shared Address Space on a Clusters of SMPs
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Efficient synchronization for nonuniform communication architectures
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Message passing and shared address space parallelism on an SMP cluster
Parallel Computing
Parallel ray tracing on a chip
Practical parallel rendering
Supporting High Level Programming with High Performance: The Illinois Concert System
HIPS '97 Proceedings of the 1997 Workshop on High-Level Programming Models and Supportive Environments (HIPS '97)
Hierarchical Backoff Locks for Nonuniform Communication Architectures
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Journal of Parallel and Distributed Computing
Solving irregularly structured problems based on distributed object model
Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
Performance Evaluation of Task Pools Based on Hardware Synchronization
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Fine-Grained Task Scheduling Using Adaptive Data Structures
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Kendo: efficient deterministic multithreading in software
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
A tool for layered analysing and understanding of distributed programs
Computer Communications
Revisiting parallel rendering for shared memory machines
EG PGV'11 Proceedings of the 11th Eurographics conference on Parallel Graphics and Visualization
Towards a scalable architecture for real-time volume rendering
EGGH'95 Proceedings of the Tenth Eurographics conference on Graphics Hardware
Hi-index | 4.10 |
Recently, a new class of scalable, shared-address-space multiprocessors has emerged. Like message-passing machines, these multiprocessors have a distributed interconnection network and physically distributed main memory. However, they provide hardware support for efficient implicit communication through a shared address space, and they automatically exploit temporal locality by caching both local and remote data in a processor's hardware cache. In this article, we show that these architectural characteristics make it much easier to obtain very good speedups on the best known visualization algorithms. Simple and natural parallelizations work very well, the sequential implementations do not have to be fundamentally restructured, and the high degree of temporal locality obviates the need for explicit data distribution and communication management. We demonstrate our claims through parallel versions of three state-of-the-art algorithms: a recent hierarchical radiosity algorithm by Hanrahan et al. (1991), a parallelized ray-casting volume renderer by Levoy (1992), and an optimized ray-tracer by Spach and Pulleyblank (1992). We also discuss a new shear-warp volume rendering algorithm that provides the first demonstration of interactive frame rates for a 256/spl times/256/spl times/256 voxel data set on a general-purpose multiprocessor.