A multithreaded message passing interface (MPI) architecture: performance and program issues
Journal of Parallel and Distributed Computing
Optimizing threaded MPI execution on SMP clusters
ICS '01 Proceedings of the 15th international conference on Supercomputing
MiMPI: A Multithred-Safe Implementation of MPI
Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
(Quasi-) Thread-Safe PVM and (Quasi-) Thread-Safe MPI without Active Polling
Proceedings of the 9th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
MPIDC '96 Proceedings of the Second MPI Developers Conference
Parallel Computing - OpenMp
Computer
Thread-safety in an MPI implementation: Requirements and analysis
Parallel Computing
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
Intel threading building blocks
Intel threading building blocks
Enabling concurrent multithreaded MPI communication on multicore petascale systems
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Implementing MPI on windows: comparison with common approaches on Unix
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Portable explicit threading and concurrent programming for MPI applications
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework
Journal of Parallel and Distributed Computing
Concurrent programming constructs for parallel MPI applications
The Journal of Supercomputing
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
As high-end computing systems continue to grow in scale, recent advances in multi- and many-core architectures have pushed such growth toward more dense architectures, that is, more processing elements per physical node, rather than more physical nodes themselves. Although a large number of scientific applications have relied so far on an MPI-everywhere model for programming high-end parallel systems; this model may not be sufficient for future machines, given their physical constraints such as decreasing amounts of memory per processing element and shared caches. As a result, application and computer scientists are exploring alternative programming models that involve using MPI between address spaces and some other threaded model, such as OpenMP, Pthreads, or Intel TBB, within an address space. Such hybrid models require efficient support from an MPI implementation for MPI messages sent from multiple threads simultaneously. In this paper, we explore the issues involved in designing such an implementation. We present four approaches to building a fully thread-safe MPI implementation, with decreasing levels of critical-section granularity (from coarse-grain locks to fine-grain locks to lock-free operations) and correspondingly increasing levels of complexity. We present performance results that demonstrate the performance implications of the different approaches.