LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The communication challenge for MPP: Intel Paragon and Meiko CS-2
Parallel Computing
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems
IEEE Transactions on Parallel and Distributed Systems
MagPIe: MPI's collective communication operations for clustered wide area systems
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatically tuned collective communications
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Network performance-aware collective communication for clustered wide-area systems
Parallel Computing - Clusters and computational grids for scientific computing
Introduction to Parallel Computing
Introduction to Parallel Computing
Assessing Fast Network Interfaces
IEEE Micro
Fast Measurement of LogP Parameters for Message Passing Platforms
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
An Evaluation of Current High-Performance Networks
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Efficient implementation of reduce-scatter in MPI
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
On optimizing collective communication
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Advanced collective communication in aspen
Proceedings of the 22nd annual international conference on Supercomputing
A Tool for Optimizing Runtime Parameters of Open MPI
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Accurate and Efficient Estimation of Parameters of Heterogeneous Communication Performance Models
International Journal of High Performance Computing Applications
Two-tree algorithms for full bandwidth broadcast, reduction and scan
Parallel Computing
Modeling advanced collective communication algorithms on cell-based systems
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Accurate Heterogeneous Communication Models and a Software Tool for Their Efficient Estimation
International Journal of High Performance Computing Applications
A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers
Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
Two algorithms of irregular scatter/gather operations for heterogeneous platforms
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Design of efficient Java message-passing collectives on multi-core clusters
The Journal of Supercomputing
Cosmic microwave background map-making at the petascale and beyond
Proceedings of the international conference on Supercomputing
High-performance high-resolution semi-Lagrangian tracer transport on a sphere
Journal of Computational Physics
Improving communication performance in dense linear algebra via topology aware collectives
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
An overview of CMPI: network performance aware MPI in the cloud
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Automatic performance optimization of the discrete fourier transform on distributed memory computers
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Improving large graph processing on partitioned graphs in the cloud
Proceedings of the Third ACM Symposium on Cloud Computing
The impact of system design parameters on application noise sensitivity
Cluster Computing
LibWater: heterogeneous distributed computing made easy
Proceedings of the 27th international ACM conference on International conference on supercomputing
Bandwidth-optimal all-to-all exchanges in fat tree networks
Proceedings of the 27th international ACM conference on International conference on supercomputing
On the performance of concurrent transfers in collective algorithms
Proceedings of the 20th European MPI Users' Group Meeting
Hi-index | 0.00 |
Previous studies of application usage show that the performance of collective communications are critical for high-performance computing. Despite active research in the field, both general and feasible solution to the optimization of collective communication problem is still missing.In this paper, we analyze and attempt to improve intra-cluster collective communication in the context of the widely deployed MPI programming paradigm by extending accepted models of point-to-point communication, such as Hockney, LogP/LogGP, and PLogP, to collective operations. We compare the predictions from models against the experimentally gathered data and using these results, construct optimal decision function for broadcast collective. We quantitatively compare the quality of the model-based decision functions to the experimentally-optimal one. Additionally, in this work, we also introduce a new form of an optimized tree-based broadcast algorithm, splitted-binary.Our results show that all of the models can provide useful insights into various aspects of the different algorithms as well as their relative performance. Still, based on our findings, we believe that the complete reliance on models would not yield optimal results. In addition, our experimental results have identified the gap parameter as being the most critical for accurate modeling of both the classical point-to-point-based pipeline and our extensions to fan-out topologies.