An approach to communication-efficient data redistribution
ICS '94 Proceedings of the 8th international conference on Supercomputing
Processor Mapping Techniques Toward Efficient Data Redistribution
IEEE Transactions on Parallel and Distributed Systems
All-to-All Personalized Communication in a Wormhole-Routed Torus
IEEE Transactions on Parallel and Distributed Systems
Fast runtime block cyclic data redistribution on multiprocessors
Journal of Parallel and Distributed Computing
Scheduling Block-Cyclic Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for Block-Cyclic Array Redistribution Between Processor Sets
IEEE Transactions on Parallel and Distributed Systems
Runtime performance of parallel array assignment: an empirical study
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Pipelined All-to-All Broadcast in All-Port Meshes and Tori
IEEE Transactions on Computers
Hybrid Algorithms for Complete Exchange in 2D Meshes
IEEE Transactions on Parallel and Distributed Systems
A Generalized Processor Mapping Technique for Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
A Framework for Efficient Data Redistribution on Distributed Memory Multicomputers
The Journal of Supercomputing
Efficient Algorithms for Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
A pipeline technique for dynamic data transfer on a multiprocessor grid
International Journal of Parallel Programming
The Journal of Supercomputing
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
The array redistribution problem occurs in many important applications in parallel computing. In this paper, we consider this problem in a torus network. Tori are preferred to other multidimensional networks (like hypercubes) due to their better scalability (IEE Trans. Parallel Distrib. Syst. 50(10), 1201---1218, [2001]). We present a message combining approach that splits any array redistribution problem in a series of broadcasts where all sources send messages of the same size, thus a balanced traffic load is achieved. Unlike existing array redistribution algorithms, the scheme introduced in this work eliminates the need for data reorganization in the memory of the source and target processors. Moreover, the processing of the scheduled broadcasts is pipelined, thus the total cost of redistribution is reduced.