ACM Transactions on Programming Languages and Systems (TOPLAS)
Solution of a problem in concurrent programming control
Communications of the ACM
A multithreaded message passing interface (MPI) architecture: performance and program issues
Journal of Parallel and Distributed Computing
Optimizing threaded MPI execution on SMP clusters
ICS '01 Proceedings of the 15th international conference on Supercomputing
MiMPI: A Multithred-Safe Implementation of MPI
Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
(Quasi-) Thread-Safe PVM and (Quasi-) Thread-Safe MPI without Active Polling
Proceedings of the 9th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Mixed Mode Matrix Multiplication
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
MPIDC '96 Proceedings of the Second MPI Developers Conference
Computer
Thread-safety in an MPI implementation: Requirements and analysis
Parallel Computing
The Importance of Non-Data-Communication Overheads in MPI
International Journal of High Performance Computing Applications
Efficient MPI support for advanced hybrid programming models
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
High-performance message-passing over generic Ethernet hardware with Open-MX
Parallel Computing
An integrated runtime scheduler for MPI
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Hi-index | 0.00 |
To make the most effective use of parallel machines that are being built out of increasingly large multicore chips, researchers are exploring the use of programming models comprising a mixture of MPI and threads. Such hybrid models require efficient support from an MPI implementation for MPI messages sent from multiple threads simultaneously. In this paper, we explore the issues involved in designing such an implementation. We present four approaches to building a fully thread-safe MPI implementation, with decreasing levels of critical-section granularity (from coarse-grain locks to fine-grain locks to lock-free operations) and correspondingly increasing levels of complexity. We describe how we have structured our implementation to support all four approaches and enable one to be selected at build time. We present performance results with a message-rate benchmark to demonstrate the performance implications of the different approaches.