Globalizing selectively: shared-memory efficiency with address-space separation

Authors:
Nilesh Mahajan;Uday Pitambare;Arun Chauhan
Affiliations:
Indiana University, Bloomington, IN;Indiana University, Bloomington, IN;Indiana University, Bloomington, IN
Venue:
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Year:
2013

Citing 12
Cited 0

Communicating sequential processes

Communications of the ACM
Program transformation and runtime support for threaded MPI execution on shared-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Resource-Based Communication Placement Analysis

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Operating System Concepts

Operating System Concepts
MPI-aware compiler optimizations for improving communication-computation overlap

Proceedings of the 23rd international conference on Supercomputing
Communication-Sensitive Static Dataflow for Parallel Message Passing Applications

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Capabilities for uniqueness and borrowing

ECOOP'10 Proceedings of the 24th European conference on Object-oriented programming
Inferring ownership transfer for efficient message passing

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Formal analysis of MPI-based parallel programs

Communications of the ACM
Implementation and shared-memory evaluation of MPICH2 over the nemesis communication subsystem

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Partial globalization of partitioned address spaces for zero-copy communication with shared memory

HIPC '11 Proceedings of the 2011 18th International Conference on High Performance Computing
Ownership passing: efficient distributed memory programming on multi-core systems

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

It has become common for MPI-based applications to run on shared-memory machines. However, MPI semantics do not allow leveraging shared memory fully for communication between processes from within the MPI library. This paper presents an approach that combines compiler transformations with a specialized runtime system to achieve zero-copy communication whenever possible by proving certain properties statically and globalizing data selectively by altering the allocation and deallocation of communication buffers. The runtime system provides dynamic optimization, when such proofs are not possible statically, by copying data only when there are write-write or read-write conflicts. We implemented a prototype compiler, using ROSE, and evaluated it on several benchmarks. Our system produces code that performs better than MPI in most cases and no worse than MPI, tuned for shared memory, in all cases.