BIP-SMP: high performance message passing over a cluster of commodity SMPs
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
PM2: a high performance communication middleware for heterogeneous network environments
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A grid-enabled MPI: message passing in heterogeneous distributed computing systems
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Multi-protocol active messages on a cluster of SMP's
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
A Software Suite for High-Performance Communications on Clusters of SMPs
Cluster Computing
Implementation and Evaluation of MPI on an SMP Cluster
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
MPICH/MADIII: a Cluster of Clusters Enabled MPI Implementation
CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
MPICH-G2: a Grid-enabled implementation of the Message Passing Interface
Journal of Parallel and Distributed Computing - Special issue on computational grids
The Impact of MPI Queue Usage on Message Latency
ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
High performance RDMA-based MPI implementation over infiniBand
International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Hi-index | 0.00 |
Modern day MPI implementations provide several communication channels for optimizing performance. To obtain the best performance for the most demanding contemporary applications, it becomes critical to manage these communication channels efficiently. Various issues related to overhead for message discovery and thresholds for choosing different channels need to be considered for designing the MPI layer. It is not a trivial task to choose these parameters since application characteristics and demands from the MPI layer vary widely. In this paper we try to address these issues. We propose several different schemes such as static priority and dynamic priority to efficiently implement channel polling. Our results indicate that we can reduce intranode latency by up to 12% and message discovery time up to 45%. Further, we explore several different methodologies to choose appropriate thresholds for different channels.