Generating communication for array statements: design, implementation, and evaluation
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
An approach to communication-efficient data redistribution
ICS '94 Proceedings of the 8th international conference on Supercomputing
Static and Run-Time Algorithms for All-to-Many Personalized Communication on Permutation Networks
IEEE Transactions on Parallel and Distributed Systems
Processor Mapping Techniques Toward Efficient Data Redistribution
IEEE Transactions on Parallel and Distributed Systems
Automatic data layout for high performance Fortran
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Efficient address generation for block-cyclic distributions
ICS '95 Proceedings of the 9th international conference on Supercomputing
Dynamic data partitioning for distributed-memory multicomputers
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Optimizations for efficient array redistribution on distributed memory multicomputers
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Efficient index set generation for compiling HPF array statements on distributed-memory machines
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Scheduling Block-Cyclic Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution
IEEE Transactions on Parallel and Distributed Systems
Communication Generation for Aligned and Cyclic(K) Distributions Using Integer Lattice
IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for Block-Cyclic Array Redistribution Between Processor Sets
IEEE Transactions on Parallel and Distributed Systems
CP-PACS: a massively parallel processor at the University of Tsukuba
Parallel Computing - Special Anniversary issue
A Generalized Basic-Cycle Calculation Method for Efficient Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for Multi-dimensional Block-Cyclic Redistribution of Arrays
ICPP '97 Proceedings of the international Conference on Parallel Processing
Automatic Selection of Dynamic Data Partitioning Schemes for Distributed-Memory Multicomputers
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
Automatic Data Layout Using 0-1 Integer Programming
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Many-to-many personalized communication with bounded traffic
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Improving Performance of Multi-Dimensional Array Redistribution on Distributed Memory Machines
HIPS '98 Proceedings of the High-Level Parallel Programming Models and Supportive Environments
Efficient Algorithms for Block-Cyclic Redistribution of Arrays
SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
Reducing Communication Cost for Parallelizing Irregular Scientific Codes
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
QR factorization for shared memory and message passing
Parallel Computing
A Divide-and-Conquer Algorithm for Irregular Redistribution in Parallelizing Compilers
The Journal of Supercomputing
Sparse Matrix Block-Cyclic Realignment on Distributed Memory Machines
The Journal of Supercomputing
Messages Scheduling for Parallel Data Redistribution between Clusters
IEEE Transactions on Parallel and Distributed Systems
Scheduling contention-free irregular redistributions in parallelizing compilers
The Journal of Supercomputing
A message passing strategy for array redistributions in a torus network
The Journal of Supercomputing
International Journal of Computer Mathematics
An efficient algorithm for irregular redistributions in parallelizing compilers
ISPA'03 Proceedings of the 2003 international conference on Parallel and distributed processing and applications
Optimizing communications of data parallel programs in scalable cluster systems
GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Localization techniques for cluster-based data grid
ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing
Optimizing scheduling stability for runtime data alignment
EUC'06 Proceedings of the 2006 international conference on Emerging Directions in Embedded and Ubiquitous Computing
Localized communications of data parallel programs on multi-cluster grid systems
EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
Irregular redistribution scheduling by partitioning messages
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Optimizations of data distribution localities in cluster grid environments
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part IV
On the complexity of the max-edge-coloring problem with its variants
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Hi-index | 0.00 |
Array redistribution is required often in programs on distributed memory parallel computers. It is essential to use efficient algorithms for redistribution; otherwise the performance of the programs will degrade considerably. The redistribution overheads consist of two parts: index computation and inter-processor communication. In this paper, by using a notation for the local data description called an LDD, we propose a framework to optimize the array redistribution algorithm both in index computation and inter-processor communication. That is, our work makes an effort to optimize not only the computation cost but also communication cost for array redistribution algorithms. We present an efficient index computation method and generate a schedule that minimizes the number of communication steps and eliminates node contention in each communication step. Some experiments show the efficiency and flexibility of our techniques.