The DINO parallel programming language
Journal of Parallel and Distributed Computing
Interprocedural compilation of Fortran D for MIMD distributed-memory machines
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Dynamic data distributions in Vienna Fortran
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Generating communication for array statements: design, implementation, and evaluation
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
An approach to communication-efficient data redistribution
ICS '94 Proceedings of the 8th international conference on Supercomputing
Processor Mapping Techniques Toward Efficient Data Redistribution
Proceedings of the 8th International Symposium on Parallel Processing
Scheduling Block-Cyclic Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution
IEEE Transactions on Parallel and Distributed Systems
Efficient Methods for kr → r and r → kr Array Redistribution1
The Journal of Supercomputing
Algorithmic Redistribution Methods for Block-Cyclic Decompositions
IEEE Transactions on Parallel and Distributed Systems
Efficient Methods for Multi-Dimensional Array Redistribution
The Journal of Supercomputing
A Generalized Basic-Cycle Calculation Method for Efficient Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
A Generalized Processor Mapping Technique for Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
A Framework for Efficient Data Redistribution on Distributed Memory Multicomputers
The Journal of Supercomputing
Message Encoding Techniques for Efficient Arrary Redistribution
ICPP '97 Proceedings of the international Conference on Parallel Processing
Efficient Method for kr-r and r-kr Arrary Redistribution
COMPSAC '97 Proceedings of the 21st International Computer Software and Applications Conference
A Compressed Diagonals Remapping Technique for Dynamic Data Redistribution on Banded Sparse Matrix
The Journal of Supercomputing
A Divide-and-Conquer Algorithm for Irregular Redistribution in Parallelizing Compilers
The Journal of Supercomputing
Sparse Matrix Block-Cyclic Realignment on Distributed Memory Machines
The Journal of Supercomputing
Improving communication scheduling for array redistribution
Journal of Parallel and Distributed Computing
A pipeline technique for dynamic data transfer on a multiprocessor grid
International Journal of Parallel Programming
The Journal of Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Scheduling contention-free irregular redistributions in parallelizing compilers
The Journal of Supercomputing
A flexible processor mapping technique toward data localization for block-cyclic data redistribution
The Journal of Supercomputing
A message passing strategy for array redistributions in a torus network
The Journal of Supercomputing
International Journal of Computer Mathematics
Message scheduling for array re-decomposition on distributed memory systems
Future Generation Computer Systems
A compressed diagonals remapping technique for dynamic data redistribution on banded sparse matrix
ISPA'03 Proceedings of the 2003 international conference on Parallel and distributed processing and applications
A dominant input stream for LUD incremental computing on a contention network
ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Optimizing communications of data parallel programs in scalable cluster systems
GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing
Localization techniques for cluster-based data grid
ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing
Optimal processor mapping scheme for efficient communication of data realignment
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
PaCT'05 Proceedings of the 8th international conference on Parallel Computing Technologies
Localized communications of data parallel programs on multi-cluster grid systems
EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
Irregular redistribution scheduling by partitioning messages
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Optimizations of data distribution localities in cluster grid environments
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part IV
Efficient multidimensional data redistribution for resizable parallel computations
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
On the complexity of the max-edge-coloring problem with its variants
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Hi-index | 0.00 |
Run-time data redistribution can enhance algorithm performance in distributed-memory machines. Explicit redistribution of data can be performed between algorithm phases when a different data decomposition is expected to deliver increased performance for a subsequent phase of computation. Redistribution, however, represents increased program overhead as algorithm computation is discontinued while data are exchanged among processor memories. In this paper, we present a technique that minimizes the amount of data exchange for BLOCK to CYCLIC(c) (or vice-versa) redistributions of arbitrary number of dimensions. Preserving the semantics of the target (destination) distribution pattern, the technique manipulates the data to logical processor mapping of the target pattern. When implemented on an IBM SP, the mapping technique demonstrates redistribution performance improvements of approximately 40% over traditional data to processor mapping. Relative to the traditional mapping technique, the proposed method affords greater flexibility in specifying precisely which data elements are redistributed and which elements remain on-processor.