Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Fast parallel algorithms for short-range molecular dynamics
Journal of Computational Physics
Optimizing threaded MPI execution on SMP clusters
ICS '01 Proceedings of the 15th international conference on Supercomputing
High performance RDMA-based MPI implementation over InfiniBand
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Protocol-Dependent Message-Passing Performance on Linux Clusters
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
LiMIC: Support for High-Performance MPI Intra-node Communication on Linux Cluster
ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Test suite for evaluating performance of multithreaded MPI communication
Parallel Computing
Automatic MPI to AMPI program transformation using photran
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Leveraging MPI's one-sided communication interface for shared-memory programming
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
On the Effects of CPU Caches on MPI Point-to-Point Communications
CLUSTER '12 Proceedings of the 2012 IEEE International Conference on Cluster Computing
KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework
Journal of Parallel and Distributed Computing
Ownership passing: efficient distributed memory programming on multi-core systems
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
NUMA-aware shared-memory collective communication for MPI
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Hi-index | 0.00 |
Multi-core shared memory architectures are ubiquitous in both High-Performance Computing (HPC) and commodity systems because they provide an excellent trade-off between performance and programmability. MPI's abstraction of explicit communication across distributed memory is very popular for programming scientific applications. Unfortunately, OS-level process separations force MPI to perform unnecessary copying of messages within shared memory nodes. This paper presents a novel approach that transparently shares memory across MPI processes executing on the same node, allowing them to communicate like threaded applications. While prior work explored thread-based MPI libraries, we demonstrate that this approach is impractical and performs poorly in practice. We instead propose a novel process-based approach that enables shared memory communication and integrates with existing MPI libraries and applications without modifications. Our protocols for shared memory message passing exhibit better performance and reduced cache footprint. Communication speedups of more than 26% are demonstrated for two applications.