Communicating sequential processes
Communications of the ACM
Program transformation and runtime support for threaded MPI execution on shared-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Resource-Based Communication Placement Analysis
LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Operating System Concepts
MPI-aware compiler optimizations for improving communication-computation overlap
Proceedings of the 23rd international conference on Supercomputing
Communication-Sensitive Static Dataflow for Parallel Message Passing Applications
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Capabilities for uniqueness and borrowing
ECOOP'10 Proceedings of the 24th European conference on Object-oriented programming
Inferring ownership transfer for efficient message passing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Formal analysis of MPI-based parallel programs
Communications of the ACM
Implementation and shared-memory evaluation of MPICH2 over the nemesis communication subsystem
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Partial globalization of partitioned address spaces for zero-copy communication with shared memory
HIPC '11 Proceedings of the 2011 18th International Conference on High Performance Computing
Ownership passing: efficient distributed memory programming on multi-core systems
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
It has become common for MPI-based applications to run on shared-memory machines. However, MPI semantics do not allow leveraging shared memory fully for communication between processes from within the MPI library. This paper presents an approach that combines compiler transformations with a specialized runtime system to achieve zero-copy communication whenever possible by proving certain properties statically and globalizing data selectively by altering the allocation and deallocation of communication buffers. The runtime system provides dynamic optimization, when such proofs are not possible statically, by copying data only when there are write-write or read-write conflicts. We implemented a prototype compiler, using ROSE, and evaluated it on several benchmarks. Our system produces code that performs better than MPI in most cases and no worse than MPI, tuned for shared memory, in all cases.