LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The communication challenge for MPP: Intel Paragon and Meiko CS-2
Parallel Computing
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems
IEEE Transactions on Parallel and Distributed Systems
MagPIe: MPI's collective communication operations for clustered wide area systems
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatically tuned collective communications
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Network performance-aware collective communication for clustered wide-area systems
Parallel Computing - Clusters and computational grids for scientific computing
Assessing Fast Network Interfaces
IEEE Micro
Fast Measurement of LogP Parameters for Message Passing Platforms
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Reproducible Measurements of MPI Performance Characteristics
Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
An Evaluation of Current High-Performance Networks
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Efficient implementation of reduce-scatter in MPI
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
On optimizing collective communication
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Self-adapting numerical software (SANS) effort
IBM Journal of Research and Development
A study of process arrival patterns for MPI collective operations
Proceedings of the 21st annual international conference on Supercomputing
MPI collective algorithm selection and quadtree encoding
Parallel Computing
Techniques for pipelined broadcast on ethernet switched clusters
Journal of Parallel and Distributed Computing
Optimal broadcast for fully connected processor-node networks
Journal of Parallel and Distributed Computing
A framework for adaptive collective communications for heterogeneous hierarchical computing systems
Journal of Computer and System Sciences
Adaptive approaches for efficient parallel algorithms on cluster-based systems
International Journal of Grid and Utility Computing
Efficient high performance collective communication for the cell blade
Proceedings of the 23rd international conference on Supercomputing
A study of process arrival patterns for MPI collective operations
International Journal of Parallel Programming
Modeling advanced collective communication algorithms on cell-based systems
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Assessing contention effects on MPI_alltoall communications
GPC'07 Proceedings of the 2nd international conference on Advances in grid and pervasive computing
Toward performance models of MPI implementations for understanding application scaling issues
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Efficient RDMA-based multi-port collectives on multi-rail QsNetII clusters
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A decomposition approach for optimizing the performance of MPI libraries
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Collective operations in NEC's high-performance MPI libraries
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Pipelined broadcast on ethernet switched clusters
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Collective communication costs analysis over gigabit ethernet and infiniband
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
A case for non-blocking collective operations
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
MPI collective algorithm selection and quadtree encoding
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
ScoPred–scalable user-directed performance prediction using complexity modeling and historical data
JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing
Performance analysis and optimization of MPI collective operations on multi-core clusters
The Journal of Supercomputing
Fat-tree routing and node ordering providing contention free traffic for MPI global collectives
Journal of Parallel and Distributed Computing
Runtime detection and optimization of collective communication patterns
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Decision trees and MPI collective algorithm selection problem
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Full bandwidth broadcast, reduction and scan with only two trees
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Optimization of collective communications in HeteroMPI
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Energy estimation for MPI broadcasting algorithms in large scale HPC systems
Proceedings of the 20th European MPI Users' Group Meeting
Hi-index | 0.00 |
Previous studies of application usage show that the performance of collective communications are critical for high-performance computing and are often overlooked when compared to the point-to-point performance. In this paper, we analyze and attempt to improve intra-cluster collective communication in the context of the widely deployed MPI programming paradigm by extending accepted models of point-to-point communication, such as Hockney, LogP/LogGP, and PLogP. The predictions from the models were compared to the experimentally gathered data and our findings were used to optimize the implementation of collective operations in the FT-MPI library.