LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
LogGP: incorporating long messages into the LogP model for parallel computation
Journal of Parallel and Distributed Computing
MagPIe: MPI's collective communication operations for clustered wide area systems
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimization of MPI collectives on clusters of large-scale SMP's
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Automatically tuned collective communications
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Building a high-performance collective communication library
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Fast Measurement of LogP Parameters for Message Passing Platforms
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
An Evaluation of Current High-Performance Networks
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Bandwidth-Efficient Collective Communication for Clustered Wide Area Systems
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Optimizing Collective Communications on SMP Clusters
ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Towards an Accurate Model for Collective Communications
International Journal of High Performance Computing Applications
Optimizing communication overlap for high-speed networks
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Recursive multi-factoring algorithm for MPI_Allreduce
PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Constructing a performance database for large-scale quantum chemistry packages
Proceedings of the 2008 Spring simulation multiconference
Performance Modeling and Analysis of a Massively Parallel Direct - Part 2
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
On SMP clusters, mixed mode collective MPI communications, which use shared memory communications within SMP nodes and point-to-point communications between SMP nodes, are more efficient than conventional implementations. In a previous study, we proposed several new methods that made mixed mode collective communications significantly faster than the pure point-to-point ones. However, the optimal performance required the tuning of many parameters, which was done by testing every possible setting and was very time consuming. In this study, we propose a new performance model that considers the special characteristics of mixed mode collective communications. The model provides good predictions to reduce most settings without testing by execution. It considers both shared-memory and point-to-point communications, while existing performance models only consider the point-to-point ones. Based on this model, we develop a number of tuning strategies that reduce the overall tuning time to only 10% of previous tuning time.