Program transformation and runtime support for threaded MPI execution on shared-memory machines

Authors:
Hong Tang;Kai Shen;Tao Yang
Affiliations:
Univ. of California, Santa Barbara;Univ. of California, Santa Barbara;Univ. of California, Santa Barbara
Venue:
ACM Transactions on Programming Languages and Systems (TOPLAS)
Year:
2000

Citing 24
Cited 13

Process control and scheduling issues for multiprogrammed shared-memory multiprocessors

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Processor scheduling in shared memory multiprocessors

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The performance of multiprogrammed multiprocessor scheduling algorithms

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The C programming language

The C programming language
Wait-free synchronization

ACM Transactions on Programming Languages and Systems (TOPLAS)
Pthreads programming

Pthreads programming
The Nexus approach to integrating multithreading and communication

Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing
Scheduler-conscious synchronization

ACM Transactions on Computer Systems (TOCS)
A high-performance MPI implementation on a shared-memory vector supercomputer

Parallel Computing
Efficient message passing interface (MPI) for parallel computing on clusters of workstations

Journal of Parallel and Distributed Computing - Special issue on workstation clusters and network-based computing
Application restructuring and performance portability on shared virtual memory and hardware-coherent multiprocessors

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Computer organization and design (2nd ed.): the hardware/software interface

Computer organization and design (2nd ed.): the hardware/software interface
MPI-SIM: using parallel simulation to evaluate MPI programs

Proceedings of the 30th conference on Winter simulation
Adaptive two-level thread management for fast MPI execution on shared memory machines

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
MPI: The Complete Reference

MPI: The Complete Reference
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
TPVM: distributed concurrent computing with lightweight processes

HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing
Dynamic Processor Allocation with the Solaris Operating System

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Managing Concurrent Access for Shared Memory Active Messages

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
A Thread Taxonomy for MPI

MPIDC '96 Proceedings of the Second MPI Developers Conference
A cellular computer to implement the kalman filter algorithm

A cellular computer to implement the kalman filter algorithm

Optimizing threaded MPI execution on SMP clusters

ICS '01 Proceedings of the 15th international conference on Supercomputing
CC--MPI: a compiled communication capable MPI prototype for ethernet switched clusters

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects

IEEE Transactions on Parallel and Distributed Systems
Automatic generation and tuning of MPI collective communication routines

Proceedings of the 19th annual international conference on Supercomputing
An MPI prototype for compiled communication on Ethernet switched clusters

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
STAR-MPI: self tuned adaptive routines for MPI collective operations

Proceedings of the 20th annual international conference on Supercomputing
A Buffered-Mode MPI Implementation for the Cell BETM Processor

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
An MPI-1 Compliant Thread-Based Implementation

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Automatic MPI to AMPI program transformation using photran

Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
A synchronous mode MPI implementation on the cell BETM architecture

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Leveraging MPI's one-sided communication interface for shared-memory programming

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Globalizing selectively: shared-memory efficiency with address-space separation

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory

Computing

Quantified Score

Hi-index	0.02

Visualization

Abstract

Parallel programs written in MPI have been widely used for developing high-performance applications on various platforms. Because of a restriction of the MPI computation model, conventional MPI implementations on shared-memory machines map each MPI node to an OS process, which can suffer serious performance degradation in the presence of multiprogramming. This paper studies compile-time and runtime techniques for enhancing performance portability of MPI code running on multiprogrammed shared-memory machines. The proposed techniques allow MPI nodes to be executed safety and efficiently as threads. Compile-time transformation eliminates global and static variables in C code using node-specific data. The runtime support includes an efficient and provably correct communication protocol that uses lock-free data structure and takes advantage of address space sharing among threads. The experiments on SGI Origin 2000 show that our MPI prototype called TMPI using the proposed techniques is competitive with SGI's native MPI implementation in a dedicated environment, and that it has significant performance advantages in a multiprogrammed environment.