Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Algorithms for matrix transposition on Boolean N-cube configured ensemble architecture
SIAM Journal on Matrix Analysis and Applications
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
Index Transformation Algorithms in a Linear Algebra Framework
IEEE Transactions on Parallel and Distributed Systems
Computational design of the NCAR community climate model
Parallel Computing - Special issue: climate and weather modeling
Design and performance of a scalable parallel community climate model
Parallel Computing - Special issue: climate and weather modeling
Parallel Algorithms for the Spectral Transform Method
SIAM Journal on Scientific Computing
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Array Permutation by Index-Digit Permutation
Journal of the ACM (JACM)
Data organization and I/O in a parallel ocean circulation model
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A ghost cell expansion method for reducing communications in solving PDE problems
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
The Journal of Supercomputing
Combining analytical and empirical approaches in tuning matrix transposition
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Efficient Parallel I/O in Community Atmosphere Model (CAM)
International Journal of High Performance Computing Applications
An evaluation of MPI and OpenMP paradigms for multi-dimensional data remapping
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Hi-index | 0.00 |
Reshuffling elements of a multidimensional array according to an index operation traditionally requires an auxiliary buffer of the same size as the original array. Here, we describe a new in-place algorithm using vacancy tracking cycles with minimum memory access which eliminates the buffer array and the related copy-back, speeding up the reshuffle significantly for large arrays. The algorithm can be parallelized using a multithread approach on shared-memory multiprocessor computers. On distributed-memory multiprocessor computers, the index reshuffle of distributed multidimensional arrays amounts to a remapping of processor domains and is carried out using the in-place local algorithm combined with a global exchange algorithm. Implementation and test results on CRAY T3E and IBM SP indicate the effectiveness of the algorithm.