Two algorithms for barrier synchronization
International Journal of Parallel Programming
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
MagPIe: MPI's collective communication operations for clustered wide area systems
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatically tuned collective communications
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
MPI: The Complete Reference
Performance Modeling and Tuning Strategies of Mixed Mode Collective Communications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Self-adapting numerical software (SANS) effort
IBM Journal of Research and Development
Recursive multi-factoring algorithm for MPI_Allreduce
PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Adaptive approaches for efficient parallel algorithms on cluster-based systems
International Journal of Grid and Utility Computing
Performance Modeling and Analysis of a Massively Parallel Direct - Part 2
International Journal of High Performance Computing Applications
Java for high performance computing: assessment of current research and practice
PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Prospectus for the next LAPACK and ScaLAPACK libraries
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
F-MPJ: scalable Java message-passing communications on parallel systems
The Journal of Supercomputing
Java in the High Performance Computing arena: Research, practice and experience
Science of Computer Programming
Hi-index | 0.00 |
The performance of the MPI's collective communications is critical in most MPI-based applications. A general algorithm for a given collective communication operation may not give good performance on all systems due to the differences in architectures, network parameters and the storage capacity of the underlying MPI implementation. Hence, collective communications have to be tuned for the system on which they will be executed. In order to determine the optimum parameters of collective communications on a given system in a time-efficient manner, the collective communications need to be modeled efficiently. In this paper, we discuss various techniques for modeling collective communications.