Compile-time generation of regular communications patterns
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Computational frameworks for the fast Fourier transform
Computational frameworks for the fast Fourier transform
Generating communication for array statements: design, implementation, and evaluation
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
An approach to communication-efficient data redistribution
ICS '94 Proceedings of the 8th international conference on Supercomputing
Compilation techniques for block-cyclic distributions
ICS '94 Proceedings of the 8th international conference on Supercomputing
Generating local addresses and communication sets for data-parallel programs
Journal of Parallel and Distributed Computing
Optimization of array redistribution for distributed memory multicomputers
Parallel Computing
Processor Mapping Techniques Toward Efficient Data Redistribution
IEEE Transactions on Parallel and Distributed Systems
Efficient address generation for block-cyclic distributions
ICS '95 Proceedings of the 9th international conference on Supercomputing
Compiling array expressions for efficient execution on distributed-memory machines
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Optimizations for efficient array redistribution on distributed memory multicomputers
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Efficient index set generation for compiling HPF array statements on distributed-memory machines
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Fast runtime block cyclic data redistribution on multiprocessors
Journal of Parallel and Distributed Computing
A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution
IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for Multi-dimensional Block-Cyclic Redistribution of Arrays
ICPP '97 Proceedings of the international Conference on Parallel Processing
Multi-phase array redistribution: modeling and evaluation
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
A New Approach to Array Redistribution: Strip Mining Redistribution
PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
HICSS '96 Proceedings of the 29th Hawaii International Conference on System Sciences Volume 1: Software Technology and Architecture
Efficient Algorithms for Block-Cyclic Redistribution of Arrays
SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
A Framework for Efficient Data Redistribution on Distributed Memory Multicomputers
The Journal of Supercomputing
Scheduling GEN_BLOCK Array Redistribution
The Journal of Supercomputing
Efficient communication sets generation for block-cyclic distribution on distributed-memory machines
Journal of Systems Architecture: the EUROMICRO Journal
IEEE Transactions on Parallel and Distributed Systems
A Compressed Diagonals Remapping Technique for Dynamic Data Redistribution on Banded Sparse Matrix
The Journal of Supercomputing
Improving communication scheduling for array redistribution
Journal of Parallel and Distributed Computing
The Journal of Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Scheduling contention-free irregular redistributions in parallelizing compilers
The Journal of Supercomputing
A flexible processor mapping technique toward data localization for block-cyclic data redistribution
The Journal of Supercomputing
Message scheduling for array re-decomposition on distributed memory systems
Future Generation Computer Systems
A compressed diagonals remapping technique for dynamic data redistribution on banded sparse matrix
ISPA'03 Proceedings of the 2003 international conference on Parallel and distributed processing and applications
A Two-Level Scheduling Strategy for optimising communications of data parallel programs in clusters
International Journal of Ad Hoc and Ubiquitous Computing
A compound scheduling strategy for irregular array redistribution in cluster based parallel system
MTPP'10 Proceedings of the Second Russia-Taiwan conference on Methods and tools of parallel programming multicomputers
Globe'11 Proceedings of the 4th international conference on Data management in grid and peer-to-peer systems
Message clustering technique towards efficient irregular data redistribution in clusters and grids
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
PaCT'05 Proceedings of the 8th international conference on Parallel Computing Technologies
Irregular redistribution scheduling by partitioning messages
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
ISO: comprehensive techniques toward efficient gen_block redistribution with multidimensional arrays
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Efficient multidimensional data redistribution for resizable parallel computations
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
On the complexity of the max-edge-coloring problem with its variants
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Hi-index | 0.00 |
In many scientific applications, dynamic array redistribution is usually required to enhance the performance of an algorithm. In this paper, we present a generalized basic-cycle calculation (GBCC) method to efficiently perform a BLOCK-CYCLIC(s) over P processors to BLOCK-CYCLIC(t) over Q processors array redistribution. In the GBCC method, a processor first computes the source/destination processor/data sets of array elements in the first generalized basic-cycle of the local array it owns. A generalized basic-cycle is defined as $lcm(sP,\;tQ)/(gcd(s,t)\times P)$ in the source distribution and $lcm(sP,\;tQ)/(gcd(s,t)\times Q)$ in the destination distribution. From the source/destination processor/data sets of array elements in the first generalized basic-cycle, we can construct packing/unpacking pattern tables to minimize the data-movement operations. Since each generalized basic-cycle has the same communication pattern, based on the packing/unpacking pattern tables, a processor can pack/unpack array elements efficiently. To evaluate the performance of the GBCC method, we have implemented this method on an IBM SP2 parallel machine, along with the PITFALLS method and the ScaLAPACK method. The cost models for these three methods are also presented. The experimental results show that the GBCC method outperforms the PITFALLS method and the ScaLAPACK method for all test samples. A brief description of the extension of the GBCC method to multidimensional array redistributions is also presented.