Performance evaluation of adaptive MPI
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Towards cortex sized artificial neural systems
Neural Networks
Optimizing communication overlap for high-speed networks
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hardware supported multicast in fat-tree-based InfiniBand networks
The Journal of Supercomputing
Bandwidth efficient all-to-all broadcast on switched clusters
International Journal of Parallel Programming
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Congestion avoidance on manycore high performance computing systems
Proceedings of the 26th ACM international conference on Supercomputing
Fat-tree routing and node ordering providing contention free traffic for MPI global collectives
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
In this paper, we study the all-to-all multicast operation.Strategies for all-to-all multicast need to be different forsmall and large messages. For small messages, the majorissue is the minimization of software overhead, where asfor large messages, the issue is network contention. Manymodern large parallel computers use the fat-tree interconnectiontopology. We therefore analyze network contentionon fat-tree networks and develop strategies to optimize collectivemulticast using known contention free communicationschedules on fat-tree networks in the design of twonovel strategies. We evaluate performance of these strategieswith up to 256 nodes (1024 processors) on an alphacluster. We present schemes that perform well when a contiguouschunk of nodes is not available. For large messages,many of our strategies have two times better throughputthan native MPI. We also demonstrate that the softwareoverhead of a collective operation is a small fraction of thetotal completion time in the presence of the communicationco-processor. We therefore compare the performance of thestudied strategies using both metrics (i) Completion time,and (ii) Computation overhead.