Optimizing threaded MPI execution on SMP clusters
ICS '01 Proceedings of the 15th international conference on Supercomputing
Composing high-performance memory allocators
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Hoard: a scalable memory allocator for multithreaded applications
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Design and Evaluation of Nemesis, a Scalable, Low-Latency, Message-Passing Communication Subsystem
CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
MPC: A Unified Parallel Runtime for Clusters of NUMA Machines
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Thread-local storage extension to support thread-based MPI/OpenMP applications
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Enabling low-overhead hybrid MPI/OpenMP parallelism with MPC
IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
Improving MPI communication overlap with collaborative polling
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Ownership passing: efficient distributed memory programming on multi-core systems
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
NUMA-aware shared-memory collective communication for MPI
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
In-place algorithms for the symmetric all-to-all exchange with MPI
Proceedings of the 20th European MPI Users' Group Meeting
Introducing kernel-level page reuse for high performance computing
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Hi-index | 0.00 |
Message-Passing Interface (MPI) has become a standard for parallel applications in high-performance computing. Within a single cluster node, MPI implementations benefit from the shared memory to speed-up intra-node communications while the underlying network protocol is exploited to communicate between nodes. However, it requires the allocation of additional buffers leading to a memory-consumption overhead. This may become an issue on future clusters with reduced memory amount per core. In this article, we propose an MPI implementation built upon the MPC framework called MPC-MPI reducing the overall memory footprint. We obtained up to 47% of memory gain on benchmarks and a real-world application.