Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
Scheduling regular and irregular communication patterns on the CM-5
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
An architecture for optimal all-to-all personalized communication
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Hybrid algorithms for complete exchange in 2D meshes
ICS '96 Proceedings of the 10th international conference on Supercomputing
Optimal algorithms for all-to-all personalized communication on rings and two dimensional tori
Journal of Parallel and Distributed Computing
Computer Networks
Multiphase Complete Exchange: A Theoretical Analysis
IEEE Transactions on Computers
All-to-All Communication on Meshes with Wormhole Routing
Proceedings of the 8th International Symposium on Parallel Processing
A Framework for Collective Personalized Communication
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Portable and scalable algorithms for irregular all-to-all communication
ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Automatic generation and tuning of MPI collective communication routines
Proceedings of the 19th annual international conference on Supercomputing
STAR-MPI: self tuned adaptive routines for MPI collective operations
Proceedings of the 20th annual international conference on Supercomputing
Low Diameter Interconnections for Routing in High-Performance Parallel Systems
IEEE Transactions on Computers
Fast and efficient total exchange on two clusters
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
We develop a message scheduling scheme that can theoretically achieve the maximum throughput for all-to-all personalized communication (AAPC) on any given Ethernet switched cluster. Based on the scheduling scheme, we implement an automatic routine generator that takes the topology information as input and produces a customized MPI Alltoall routine, a routine in the Message Passing Interface (MPI) standard that realizes AAPC. Experimental results show that the automatically generated routine consistently out-performs other MPI Alltoall algorithms, including those in LAM/MPI and MPICH, on Ethernet switched clusters with different network topologies when the message size is sufficiently large. Thisdemonstrates the superiority of the proposed AAPC algorithm in exploiting network bandwidths.