Automatically tuned collective communications
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Reconfigurable computing: a survey of systems and software
ACM Computing Surveys (CSUR)
Network performance-aware collective communication for clustered wide-area systems
Parallel Computing - Clusters and computational grids for scientific computing
Building a high-performance collective communication library
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Collective Communication on Dedicated Clusters of Workstations
Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
The Analysis and Optimization of Collective Communications on a Beowulf Cluster
ICPADS '02 Proceedings of the 9th International Conference on Parallel and Distributed Systems
Send-receive considered harmful: Myths and realities of message passing
ACM Transactions on Programming Languages and Systems (TOPLAS)
ClusterSim: a Java-based parallel discrete-event simulation tool for cluster computing
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
A proposal of reconfigurable MPI collective communication functions
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Hi-index | 0.00 |
The broadcast function is one of the most used collective communication ficnctions of the Message Passing Interface (MPI) library. Broadcasts are usually implemented with invariable algorithms, which fail to yield the best performance with all kinds of applications in all execution environments. This problem should be addressed, since the performance of the function has great influence on MPI-based applications. In this paper, we present, simulate, analytically model, veriJL and analyze RMBcast, a reconmrable broadcast finction, which presents variable structures/behaviors, in order to optimize flexibility and peflormance. Our results show that reconfiguration at the algorithm level yields flexibility and performance gains over traditional broadcast ficnctions, which are implemented with invariable algorithms found in well-known implementations of the MPI standard (ie. MPICH and LAWMPI).