Remote memory access: A case for portable, efficient and library independent parallel programming

Authors:
Alexandros V. Gerbessiotis;Seung-Yeop Lee
Affiliations:
CS Department, New Jersey Institute of Technology, Newark, NJ 07102, USA (Correspd. alexg@cs.njit.edu);CS Department, New Jersey Institute of Technology, Newark, NJ 07102, USA
Venue:
Scientific Programming
Year:
2004

Citing 12
Cited 2

A bridging model for parallel computation

Communications of the ACM
Introduction to algorithms

Introduction to algorithms
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Scalable parallel geometric algorithms for coarse grained multicomputers

SCG '93 Proceedings of the ninth annual symposium on Computational geometry
Direct bulk-synchronous parallel algorithms

Journal of Parallel and Distributed Computing
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing

PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
BSP vs LogP

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Deterministic sorting and randomized median finding on the BSP model

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Can shared-memory model serve as a bridging model for parallel computation?

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
BSPlib: The BSP programming library

Parallel Computing
The Paderborn University BSP (PUB) library

Parallel Computing
The performance and scalability of SHMEM and MPI-2 one-sided routines on a SGI Origin 2000 and a Cray T3E-600: Performances

Concurrency and Computation: Practice & Experience

Coprocessor design to support MPI primitives in configurable multiprocessors

Integration, the VLSI Journal
Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work we make a strong case for remote memory access (RMA) as the effective way to program a parallel computer by proposing a framework that supports RMA in a library independent, simple and intuitive way. If one uses our approach the parallel code one writes will run transparently under MPI-2 enabled libraries but also bulk-synchronous parallel libraries. The advantage of using RMA is code simplicity, reduced programming complexity, and increased efficiency. We support the latter claims by implementing under this framework a collection of benchmark programs consisting of a communication and synchronization performance assessment program, a dense matrix multiplication algorithm, and two variants of a parallel radix-sort algorithm and examine their performance on a LINUX-based PC cluster under three different RMA enabled libraries: LAM MPI, BSPlib, and PUB. We conclude that implementations of such parallel algorithms using RMA communication primitives lead to code that is as efficient as the message-passing equivalent code and in the case of radix-sort substantially more efficient. In addition our work can be used as a comparative study of the relevant capabilities of the three libraries.