FFTs in external or hierarchical memory
The Journal of Supercomputing
Efficient transposition algorithms for large matrices
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Index Transformation Algorithms in a Linear Algebra Framework
IEEE Transactions on Parallel and Distributed Systems
Asymptotically Tight Bounds for Performing BMMC Permutations on Parallel Disk Systems
SIAM Journal on Computing
Space-time trade-off optimization for a class of electronic structure calculations
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
An Efficient Algorithm for Out-of-Core Matrix Transposition
IEEE Transactions on Computers
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
A high-level approach to synthesis of high-performance codes for quantum chemistry
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Global Communication Optimization for Tensor Contraction Expressions under Memory Constraints
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
A Fast Computer Method for Matrix Transposing
IEEE Transactions on Computers
IEEE Transactions on Computers
A Computer Algorithm for Transposing Nonsquare Matrices
IEEE Transactions on Computers
A Generalization of Eklundh's Algorithm for Transposing Large Matrices
IEEE Transactions on Computers
Hi-index | 0.00 |
This paper addresses the problem of parallel transposition of large out-of-core arrays. Although algorithms for out-of-core matrix transposition have been widely studied, previously proposed algorithms have sought to minimise the number of I/O operations and the in-memory permutation time. We propose an algorithm that directly targets the improvement of overall transposition time. The I/O characteristics of the system are used to determine the read, write and communication block sizes such that the total execution time is minimised. We also provide a solution to the array redistribution problem for arrays on disk. The solutions to the sequential transposition problem and the parallel array redistribution problem are then combined to obtain an algorithm for the parallel out-of-core transposition problem.