A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution
IEEE Transactions on Parallel and Distributed Systems
Efficient Methods for kr → r and r → kr Array Redistribution1
The Journal of Supercomputing
Toward Optimal Complete Exchange on Wormhole-Routed Tori
IEEE Transactions on Computers
Efficient Algorithms for Block-Cyclic Array Redistribution Between Processor Sets
IEEE Transactions on Parallel and Distributed Systems
Efficient Methods for Multi-Dimensional Array Redistribution
The Journal of Supercomputing
A Generalized Basic-Cycle Calculation Method for Efficient Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
Integer lattice based methods for local address generation for block-cyclic distributions
Compiler optimizations for scalable parallel systems
A Generalized Processor Mapping Technique for Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
Efficient algorithms for block-cyclic array redistribution between processor sets
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Distribution Assignment Placement: Effective Optimization of Redistribution Costs
IEEE Transactions on Parallel and Distributed Systems
Block-cyclic redistribution over heterogeneous networks
Cluster Computing
Scheduling GEN_BLOCK Array Redistribution
The Journal of Supercomputing
Efficient Algorithms for Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
Message Encoding Techniques for Efficient Arrary Redistribution
ICPP '97 Proceedings of the international Conference on Parallel Processing
Efficient Method for kr-r and r-kr Arrary Redistribution
COMPSAC '97 Proceedings of the 21st International Computer Software and Applications Conference
VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
A Compressed Diagonals Remapping Technique for Dynamic Data Redistribution on Banded Sparse Matrix
The Journal of Supercomputing
Sparse Matrix Block-Cyclic Realignment on Distributed Memory Machines
The Journal of Supercomputing
The Journal of Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Scheduling contention-free irregular redistributions in parallelizing compilers
The Journal of Supercomputing
A flexible processor mapping technique toward data localization for block-cyclic data redistribution
The Journal of Supercomputing
Message scheduling for array re-decomposition on distributed memory systems
Future Generation Computer Systems
A compressed diagonals remapping technique for dynamic data redistribution on banded sparse matrix
ISPA'03 Proceedings of the 2003 international conference on Parallel and distributed processing and applications
A Two-Level Scheduling Strategy for optimising communications of data parallel programs in clusters
International Journal of Ad Hoc and Ubiquitous Computing
Optimizing scheduling stability for runtime data alignment
EUC'06 Proceedings of the 2006 international conference on Emerging Directions in Embedded and Ubiquitous Computing
Optimal processor mapping scheme for efficient communication of data realignment
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
PaCT'05 Proceedings of the 8th international conference on Parallel Computing Technologies
Irregular redistribution scheduling by partitioning messages
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
ISO: comprehensive techniques toward efficient gen_block redistribution with multidimensional arrays
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
On the complexity of the max-edge-coloring problem with its variants
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Hi-index | 0.00 |
Array redistribution is used in languages such as High Performance Fortran to allow programmers to dynamically change the distribution of arrays across processors. Distributed-memory implementations of several scientific applications require array redistribution. In this paper, efficient methods for performing array redistribution are presented. Precise closed forms for determining the processors involved in the communication and the data elements to be communicated are developed for two special cases of array redistribution involving block-cyclically distributed arrays. The general array redistribution problem involving block-cyclically distributed arrays can be expressed in terms of these special cases. Using the closed forms, a cost model for estimating the communication overhead for array redistribution is developed. A multi-phase approach for reducing the communication cost of array redistribution is presented. Experimental results on the Cray T3D to evaluate the multi-phase approach are provided.