Practical parallel algorithms for personalized communication and integer sorting
Journal of Experimental Algorithmics (JEA)
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Modeling parallel bandwidth: local vs. global restrictions
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Lower Bounds on Communication Loads and Optimal Placements in Torus Networks
IEEE Transactions on Computers
Thinning protocols for routing h-relations over shared media
Journal of Parallel and Distributed Computing
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Address-free all-to-all routing in sparse torus
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Hi-index | 0.00 |
A central question in parallel computing is to determine the extent to which one can write parallel programs using a high-level, general-purpose, and architecture-independent programming language and have them executed on a variety of parallel and distributed architectures without sacrificing efficiency. A large body of research suggests that, at least in theory, general-purpose parallel computing is indeed possible provided certain conditions are met: an excess of logical parallelism in the program, and the ability of the target architecture to efficiently realize balanced communication patterns. The canonical example of a balanced communication pattern is an h-relation, in which each processor is the origin and destination of at most h messages. A plethora of protocols has been designed for routing h-relations in a variety of networks. The goal has been to minimize the value of h while guaranteeing delivery of the messages within a time constant factor from optimal. In this paper we describe protocols that meet the most stringent efficiency requirement, namely delivery of messages within time that is a lower order additive term from the best achievable. Such protocols are called 1-optimal. While these protocols achieve 1-optimality only for heavily loaded networks, that is, for large values of h, they are remarkable for their simplicity in that they only use the total-exchange communication primitive. The total-exchange can be realized in many networks using very simple, contention-free, and extremely efficient schemes. The technical contribution of this paper is a protocol to route random h-relations in an N-processor network using /sup h///sub N/(1+o(1))+O(log log N) total-exchange rounds with high probability. Using message duplication, we can improve the bound to /sup h///sub N/(1+o(1))+O(log*N). This improves upon the /sup h///sub N/(1+o(1))+O(log N) bound of Gerbessiotis and Valiant. While our theoretical improvements are modest, our experimental results show an improvement over the protocol of A. Gerebessiotis and L.G. Valiant.