Thread-safety in an MPI implementation: Requirements and analysis
Parallel Computing
Proceedings of the 22nd annual international conference on Supercomputing
Test suite for evaluating performance of multithreaded MPI communication
Parallel Computing
Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming
International Journal of High Performance Computing Applications
IBM System Blue Gene Solution: Blue Gene/P Application Development
IBM System Blue Gene Solution: Blue Gene/P Application Development
Asynchronous PGAS runtime for Myrinet networks
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Specification and verification of reliability in dispatching multicast messages
The Journal of Supercomputing
SPBC: leveraging the characteristics of MPI HPC applications for scalable checkpointing
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.01 |
With the ever-increasing numbers of cores per node on HPC systems, applications are increasingly using threads to exploit the shared memory within a node, combined with MPI across nodes. Achieving high performance when a large number of concurrent threads make MPI calls is a challenging task for an MPI implementation. We describe the design and implementation of our solution in MPICH2 to achieve high-performance multithreaded communication on the IBM Blue Gene/P. We use a combination of a multichannel-enabled network interface, fine-grained locks, lock-free atomic operations, and specially designed queues to provide a high degree of concurrent access while still maintaining MPI's message-ordering semantics. We present performance results that demonstrate that our new design improves the multithreaded message rate by a factor of 3.6 compared with the existing implementation on the BG/P. Our solutions are also applicable to other high-end systems that have parallel network access capabilities.