Parallel Visualization Algorithms: Performance and Architectural Implications

Authors:
Jaswinder Pal Singh;Anoop Gupta;Marc Levoy
Affiliations:
-;-;-
Venue:
Computer
Year:
1994

Citing 7
Cited 61

A rapid hierarchical radiosity algorithm

Proceedings of the 18th annual conference on Computer graphics and interactive techniques
Volume rendering on scalable shared-memory MIMD architectures

VVS '92 Proceedings of the 1992 workshop on Volume visualization
Working sets, cache sizes, and node granularity issues for large-scale multiprocessors

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Parallel hierarchical N-body methods and their implications for multiprocessors

Parallel hierarchical N-body methods and their implications for multiprocessors
Fast volume rendering using a shear-warp factorization of the viewing transformation

SIGGRAPH '94 Proceedings of the 21st annual conference on Computer graphics and interactive techniques
The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A progressive refinement approach to fast radiosity image generation

SIGGRAPH '88 Proceedings of the 15th annual conference on Computer graphics and interactive techniques

Implications of hierarchical N-body methods for multiprocessor architectures

ACM Transactions on Computer Systems (TOCS)
Reducing false sharing on shared memory multiprocessors through compile time data transformations

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Real-time volume rendering on shared memory multiprocessors using the shear-warp factorization

PRS '95 Proceedings of the IEEE symposium on Parallel rendering
Efficient parallel global illumination using density estimation

PRS '95 Proceedings of the IEEE symposium on Parallel rendering
SM-prof: a tool to visualise and find cache coherence performance bottlenecks in multiprocessor programs

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
CRL: high-performance all-software distributed shared memory

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Coarse-grained parallelism for hierarchical radiosity using group iterative methods

SIGGRAPH '96 Proceedings of the 23rd annual conference on Computer graphics and interactive techniques
Towards efficiency and portability: programming with the BSP model

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
A performance evaluation of cluster architectures

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Application restructuring and performance portability on shared virtual memory and hardware-coherent multiprocessors

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving parallel shear-warp volume rendering on shared address space multiprocessors

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploiting deep parallel memory hierarchies for ray casting volume rendering

PRS '97 Proceedings of the IEEE symposium on Parallel rendering
Parallel hierarchical computation of specular radiosity

PRS '97 Proceedings of the IEEE symposium on Parallel rendering
A parallel hierarchical radiosity algorithm for complex scenes

PRS '97 Proceedings of the IEEE symposium on Parallel rendering
Performance analysis on a CC-NUMA prototype

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Predicting the performance of distributed virtual shared-memory applications

IBM Systems Journal
Interactive ray tracing

I3D '99 Proceedings of the 1999 symposium on Interactive 3D graphics
Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Scaling application performance on a cache-coherent multiprocessor

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A comparison of MPI, SHMEM and cache-coherent shared address space programming models on the SGI Origin2000

ICS '99 Proceedings of the 13th international conference on Supercomputing
Scheduling threads for low space requirement and good locality

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
On the partitionability of hierarchical radiosity

PVGS '99 Proceedings of the 1999 IEEE symposium on Parallel visualization and graphics
Developing a practical parallel multi-pass renderer in Java and C++: toward a Grande application in Java

Proceedings of the ACM 2000 conference on Java Grande
Hybrid sort-first and sort-last parallel rendering with a cluster of PCs

HWWS '00 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
Accelerating shared virtual memory via general-purpose network interface support

ACM Transactions on Computer Systems (TOCS)
A comparison of three programming models for adaptive applications on the Origin2000

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Towards a first vertical prototyping of an extremely fine-grained parallel programming approach

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
A hierarchical load-balancing framework for dynamic multithreaded computations

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Pthreads for dynamic and irregular parallelism

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Integrating non-blocking synchronisation in parallel applications: performance advantages and methodologies

WOSP '02 Proceedings of the 3rd international workshop on Software and performance
Parallel Shear-Warp Factorization Volume Rendering Using Efficient 1-D and 2-D Partitioning Schemes for Distributed Memory Multicomputers

The Journal of Supercomputing
A comparison of three programming models for adaptive applications on the origin2000

Journal of Parallel and Distributed Computing
A Comparison of MPI, SHMEM and Cache-Coherent Shared Address Space Programming Models on a Tightly-Coupled Multiprocessors

International Journal of Parallel Programming
Ray Casting on Shared-Memory Architectures: Memory-Hierarchy Considerations in Volume Rendering

IEEE Concurrency
Analysis of a Parallel Volume Rendering System Based on the Shear-Warp Factorization

IEEE Transactions on Visualization and Computer Graphics
A Perceptually-Driven Parallel Algorithm for Efficient Radiosity Simulation

IEEE Transactions on Visualization and Computer Graphics
PVR: High-Performance Volume Rendering

IEEE Computational Science & Engineering
Parallel Implementations of Probabilistic Inference

Computer
A Fully Compliant OpenMP Implementationon Software Distributed Shared Memory

HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
A Hierarchical Parallel Processing System for the Multipass-Rendering Method

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
View Caching: Efficient Software Shared Memory for Dynamic Computations

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Message Passing Vs. Shared Address Space on a Clusters of SMPs

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A Rotate-Tiling Image Composition Method for Parallel Volume Rendering on Distributed Memory Multicomputers

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Efficient synchronization for nonuniform communication architectures

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Message passing and shared address space parallelism on an SMP cluster

Parallel Computing
Parallel ray tracing on a chip

Practical parallel rendering
Supporting High Level Programming with High Performance: The Illinois Concert System

HIPS '97 Proceedings of the 1997 Workshop on High-Level Programming Models and Supportive Environments (HIPS '97)
Hierarchical Backoff Locks for Nonuniform Communication Architectures

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Shared virtual memory clusters: bridging the cost-performance gap between SMPs and hardware DSM systems

Journal of Parallel and Distributed Computing
Solving irregularly structured problems based on distributed object model

Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
Performance Evaluation of Task Pools Based on Hardware Synchronization

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Interactive ray tracing

SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
TRLE--an efficient data compression scheme for image composition of volume rendering on distributed memory multicomputers

The Journal of Supercomputing
Fine-Grained Task Scheduling Using Adaptive Data Structures

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Kendo: efficient deterministic multithreading in software

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
A tool for layered analysing and understanding of distributed programs

Computer Communications
Revisiting parallel rendering for shared memory machines

EG PGV'11 Proceedings of the 11th Eurographics conference on Parallel Graphics and Visualization
Towards a scalable architecture for real-time volume rendering

EGGH'95 Proceedings of the Tenth Eurographics conference on Graphics Hardware
Visualizing 3D/4D environmental data using many-core graphics processing units (GPUs) and multi-core central processing units (CPUs)

Computers & Geosciences

Quantified Score

Hi-index	4.10

Visualization

Abstract

Recently, a new class of scalable, shared-address-space multiprocessors has emerged. Like message-passing machines, these multiprocessors have a distributed interconnection network and physically distributed main memory. However, they provide hardware support for efficient implicit communication through a shared address space, and they automatically exploit temporal locality by caching both local and remote data in a processor's hardware cache. In this article, we show that these architectural characteristics make it much easier to obtain very good speedups on the best known visualization algorithms. Simple and natural parallelizations work very well, the sequential implementations do not have to be fundamentally restructured, and the high degree of temporal locality obviates the need for explicit data distribution and communication management. We demonstrate our claims through parallel versions of three state-of-the-art algorithms: a recent hierarchical radiosity algorithm by Hanrahan et al. (1991), a parallelized ray-casting volume renderer by Levoy (1992), and an optimized ray-tracer by Spach and Pulleyblank (1992). We also discuss a new shear-warp volume rendering algorithm that provides the first demonstration of interactive frame rates for a 256/spl times/256/spl times/256 voxel data set on a general-purpose multiprocessor.