Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
Scheduling regular and irregular communication patterns on the CM-5
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
An architecture for optimal all-to-all personalized communication
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Hybrid algorithms for complete exchange in 2D meshes
ICS '96 Proceedings of the 10th international conference on Supercomputing
Optimal algorithms for all-to-all personalized communication on rings and two dimensional tori
Journal of Parallel and Distributed Computing
Computer Networks
Multiphase Complete Exchange: A Theoretical Analysis
IEEE Transactions on Computers
Portable and scalable algorithm for irregular all-to-all communication
Journal of Parallel and Distributed Computing
Algorithms for Supporting Compiled Communication
IEEE Transactions on Parallel and Distributed Systems
All-to-All Communication on Meshes with Wormhole Routing
Proceedings of the 8th International Symposium on Parallel Processing
A Framework for Collective Personalized Communication
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
ISCC '03 Proceedings of the Eighth IEEE International Symposium on Computers and Communications
Optimal Contention-Free Unicast-Based Multicasting in Switch-Based Networks of Workstations
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
MPI: A Message-Passing Interface
MPI: A Message-Passing Interface
Contention-Aware Communication Schedule for High-Speed Communication
Cluster Computing
Automatic generation and tuning of MPI collective communication routines
Proceedings of the 19th annual international conference on Supercomputing
An MPI prototype for compiled communication on Ethernet switched clusters
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Techniques for pipelined broadcast on ethernet switched clusters
Journal of Parallel and Distributed Computing
Optimization of logical rings for multi-hop transmissions in WDM optical star networks
Computer Communications
OpenMP Extensions for Irregular Parallel Applications on Clusters
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Bandwidth optimal all-reduce algorithms for clusters of workstations
Journal of Parallel and Distributed Computing
Bandwidth efficient all-to-all broadcast on switched clusters
International Journal of Parallel Programming
A study of process arrival patterns for MPI collective operations
International Journal of Parallel Programming
Improvement of multi-hop packet transmission scheduling in WDM optical star networks
Computer Communications
Contention-free communication scheduling for group communication in data parallelism
OTM'07 Proceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II
Hi-index | 0.00 |
We develop a message scheduling scheme for efficiently realizing all-to-all personalized communication (AAPC) on Ethernet switched clusters with one or more switches. To avoid network contention and achieve high performance, the message scheduling scheme partitions AAPC into phases such that 1) there is no network contention within each phase and 2) the number of phases is minimum. Thus, realizing AAPC with the contention-free phases computed by the message scheduling algorithm can potentially achieve the minimum communication completion time. In practice, phased AAPC schemes must introduce synchronizations to separate messages in different phases. We investigate various synchronization mechanisms and various methods for incorporating synchronizations into the AAPC phases. Experimental results show that the message scheduling-based AAPC implementations with proper synchronization consistently achieve high performance on clusters with many different network topologies when the message size is large.