A comparison of message passing and shared memory architectures for data parallel programs

Authors:
A. C. Klaiber;H. M. Levy
Affiliations:
Department of Computer Science and Engineering, University of Washington, Seattle, WA;Department of Computer Science and Engineering, University of Washington, Seattle, WA
Venue:
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Year:
1994

Citing 12
Cited 11

Compiling C* programs for a hypercube multicomputer

PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
iPSC/2 system: a second generation hypercube

C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
The implementation of a coherent memory abstraction on a NUMA multiprocessor: experiences with platinum

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
The J-machine system

Artificial intelligence at MIT expanding frontiers
Implementation and performance of Munin

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Data-parallel programming on MIMD computers

Data-parallel programming on MIMD computers
The Stanford Dash Multiprocessor

Computer
DDM: A Cache-Only Memory Architecture

Computer
Willow: a scalable shared memory multiprocessor

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Integrating message-passing and shared-memory: early experience

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing

Where is time spent in message-passing and shared-memory programs?

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
CRL: high-performance all-software distributed shared memory

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Performance evaluation of message-driven parallel VLSI CAD applications on general purpose multiprocessors

ICS '97 Proceedings of the 11th international conference on Supercomputing
A comparison of MPI, SHMEM and cache-coherent shared address space programming models on the SGI Origin2000

ICS '99 Proceedings of the 13th international conference on Supercomputing
A Comparison of MPI, SHMEM and Cache-Coherent Shared Address Space Programming Models on a Tightly-Coupled Multiprocessors

International Journal of Parallel Programming
Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors

GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Comparative evaluation and case studies of shared-memory and data-parallel execution patterns[1]

Scientific Programming
Comparing memory systems for chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Energy-Efficient Multiprocessor Systems-on-Chip for Embedded Computing: Exploring Programming Models and Their Architectural Support

IEEE Transactions on Computers
Comparative evaluation of memory models for chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Performance evaluation of view-oriented parallel programming on cluster of computers

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Shared memory and message passing are two opposing communication models for parallel multicomputer architectures. Comparing such architectures has been difficult, because applications must be hand-crafted for each architecture, often resulting in radically different sources for comparison. While it is clear that shared memory machines are currently easier to program, in the future, programs will be written in high-level languages and compiled to the specific parallel target, thus eliminating this difference.In this paper, we evaluate several parallel architecture alternatives --- message passing, NUMA, and cachecoherent shared memory --- for a collection of scientific benchmarks written in C*, a data-parallel language. Using a single suite of C* source programs, we compile each benchmark and simulate the interconnect for the alternative models. Our objective is to examine underlying, technology-independent costs inherent in each alternative. Our results show the relative work required to execute these data parallel programs on the different architectures, and point out where some models have inherent advantages for particular data-parallel program styles.