Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms
IBM Journal of Research and Development
SIAM Journal on Computing
A Method for Transposing a Matrix
Journal of the ACM (JACM)
Algorithm 467: Matrix Transposition in Place
Communications of the ACM
Algorithm 513: Analysis of In-Situ Transposition [F1]
ACM Transactions on Mathematical Software (TOMS)
ACM Transactions on Mathematical Software (TOMS)
Algorithm 380: in-situ transposition of a rectangular matrix [F1]
Communications of the ACM
Algorithm 302: Transpose vector stored array
Communications of the ACM
Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion
ACM Transactions on Mathematical Software (TOMS)
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I
Cache blocking for linear algebra algorithms
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms
ACM Transactions on Mathematical Software (TOMS)
Scaling LAPACK panel operations using parallel cache assignment
ACM Transactions on Mathematical Software (TOMS)
Hi-index | 0.00 |
We present a new Algorithm for In-Place Rectangular Transposition of an m by n matrix A that is efficient. In worst case it is O(N log N) where N = mn. It uses a bit-vector of size IWORK words to further increase its efficiency. When IWORK=0 no extra storage is used. We also review some of the other existing algorithms for this problem. These contributions were made by Gower, Windley, Knuth, Macleod, Laffin and Brebner (ACM Alg. 380), Brenner (ACM Alg. 467), and Cate and Twigg (ACM Alg. 513). Performance results are given and they are compared to an Out-of-Place Transposition algorithm as well as ACM Algorithm 467.