Optimizing threaded MPI execution on SMP clusters
ICS '01 Proceedings of the 15th international conference on Supercomputing
Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
International Journal of High Performance Computing Applications
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Lock-Free Asynchronous Rendezvous Design for MPI Point-to-Point Communication
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
MPC-MPI: An MPI Implementation Reducing the Overall Memory Consumption
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Optimizing bandwidth limited problems using one-sided communication and overlap
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
The Impact of Application's Micro-Imbalance on the Communication-Computation Overlap
PDP '11 Proceedings of the 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing
Thread-local storage extension to support thread-based MPI/OpenMP applications
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Performance evaluation of thread-based MPI in shared memory
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Test suite for evaluating performance of MPI implementations that support MPI_THREAD_MULTIPLE
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Improving MPI communication overlap with collaborative polling
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Added Concurrency to Improve MPI Performance on Multicore
ICPP '12 Proceedings of the 2012 41st International Conference on Parallel Processing
Hi-index | 0.00 |
With the rise of parallel applications complexity, the needs in term of computational power are continually growing. Recent trends in High-Performance Computing (HPC) have shown that improvements in single-core performance will not be sufficient to face the challenges of an exascale machine: we expect an enormous growth of the number of cores as well as a multiplication of the data volume exchanged across compute nodes. To scale applications up to Exascale, the communication layer has to minimize the time while waiting for network messages. This paper presents a message progression based on Collaborative Polling which allows an efficient auto-adaptive overlapping of communication phases by performing computing. This approach is new as it increases the application overlap potential without introducing overheads of a threaded message progression. We designed our approch for Infiniband into a thread-based MPI runtime called MPC. We evaluate the gain from Collaborative Polling on the NAS Parallel Benchmarks and three scientific applications, where we show significant improvements in communication times up to a factor of 2.