MPC-MPI: An MPI Implementation Reducing the Overall Memory Consumption

Authors:
Marc Pérache;Patrick Carribault;Hervé Jourdren
Affiliations:
CEA, DAM, DIF, Arpajon, France F-91297;CEA, DAM, DIF, Arpajon, France F-91297;CEA, DAM, DIF, Arpajon, France F-91297
Venue:
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Year:
2009

Citing 7
Cited 9

Optimizing threaded MPI execution on SMP clusters

ICS '01 Proceedings of the 15th international conference on Supercomputing
Composing high-performance memory allocators

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Hoard: a scalable memory allocator for multithreaded applications

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Design and Evaluation of Nemesis, a Scalable, Low-Latency, Message-Passing Communication Subsystem

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
High-performance and scalable MPI over InfiniBand with reduced memory usage: an in-depth performance analysis

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Reducing Connection Memory Requirements of MPI for InfiniBand Clusters: A Message Coalescing Approach

CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
MPC: A Unified Parallel Runtime for Clusters of NUMA Machines

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing

Thread-local storage extension to support thread-based MPI/OpenMP applications

IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Enabling low-overhead hybrid MPI/OpenMP parallelism with MPC

IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
Improving MPI communication overlap with collaborative polling

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Ownership passing: efficient distributed memory programming on multi-core systems

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
NUMA-aware shared-memory collective communication for MPI

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
In-place algorithms for the symmetric all-to-all exchange with MPI

Proceedings of the 20th European MPI Users' Group Meeting
Introducing kernel-level page reuse for high performance computing

Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory

Computing
Improving MPI communication overlap with collaborative polling

Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Message-Passing Interface (MPI) has become a standard for parallel applications in high-performance computing. Within a single cluster node, MPI implementations benefit from the shared memory to speed-up intra-node communications while the underlying network protocol is exploited to communicate between nodes. However, it requires the allocation of additional buffers leading to a memory-consumption overhead. This may become an issue on future clusters with reduced memory amount per core. In this article, we propose an MPI implementation built upon the MPC framework called MPC-MPI reducing the overall memory footprint. We obtained up to 47% of memory gain on benchmarks and a real-world application.