Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
Principles of runtime support for parallel processors
ICS '88 Proceedings of the 2nd international conference on Supercomputing
An experimental study of methods for parallel preconditioned Krylov methods
C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
Process decomposition through locality of reference
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
Solving problems on concurrent processors: vol. 2
Solving problems on concurrent processors: vol. 2
Characterizing the parallel performance of a large-scale, particle-in-cell plasma simulation code
Concurrency: Practice and Experience
Hypercube algorithms: with applications to image processing and pattern recognition
Hypercube algorithms: with applications to image processing and pattern recognition
Compiling Global Name-Space Parallel Loops for Distributed Execution
IEEE Transactions on Parallel and Distributed Systems
Optimal Broadcasting in Mesh-Connected Architectures
Optimal Broadcasting in Mesh-Connected Architectures
Scalable S-To-P Broadcasting on Message-Passing MPPs
IEEE Transactions on Parallel and Distributed Systems
A Framework for Efficient Data Redistribution on Distributed Memory Multicomputers
The Journal of Supercomputing
Improving communication scheduling for array redistribution
Journal of Parallel and Distributed Computing
Exchanging messages of different sizes
Journal of Parallel and Distributed Computing
Bandwidth-optimal all-to-all exchanges in fat tree networks
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
With the advent of new routing methods, the distance that a message is sent isbecoming relatively less and less important. Thus, assuming no link contention,permutation seems to be an efficient collective communication primitive. In this paper, we present several algorithms for decomposing all-to-many personalized communication into a set of disjoint partial permutations. We discuss several algorithms and study theireffectiveness from the view of static scheduling as well as run-time scheduling. Anapproximate analysis shows that with n processors, and assuming that every processorsends and receives d messages to random destinations, our algorithm can perform thescheduling in O(dn In d) time, on average, and can use an expected number of d+log dpartial permutations to carry out the communication. We present experimental results ofour algorithms on the CM-5.