In-place transposition of rectangular matrices

Authors:
Fred G. Gustavson;Tadeusz Swirszcz
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY;Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
Venue:
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Year:
2006

Citing 8
Cited 5

Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms

IBM Journal of Research and Development
Permuting In Place

SIAM Journal on Computing
A Method for Transposing a Matrix

Journal of the ACM (JACM)
Algorithm 467: Matrix Transposition in Place

Communications of the ACM
Algorithm 513: Analysis of In-Situ Transposition [F1]

ACM Transactions on Mathematical Software (TOMS)
Remark on “Algorithm 513: Analysis of In-Situ Transposition [F1]” and Remark on “Algorithm 467: Matrix Transposition in Place [F1]”

ACM Transactions on Mathematical Software (TOMS)
Algorithm 380: in-situ transposition of a rectangular matrix [F1]

Communications of the ACM
Algorithm 302: Transpose vector stored array

Communications of the ACM

Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion

ACM Transactions on Mathematical Software (TOMS)
Cache blocking

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I
Cache blocking for linear algebra algorithms

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms

ACM Transactions on Mathematical Software (TOMS)
Scaling LAPACK panel operations using parallel cache assignment

ACM Transactions on Mathematical Software (TOMS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new Algorithm for In-Place Rectangular Transposition of an m by n matrix A that is efficient. In worst case it is O(N log N) where N = mn. It uses a bit-vector of size IWORK words to further increase its efficiency. When IWORK=0 no extra storage is used. We also review some of the other existing algorithms for this problem. These contributions were made by Gower, Windley, Knuth, Macleod, Laffin and Brebner (ACM Alg. 380), Brenner (ACM Alg. 467), and Cate and Twigg (ACM Alg. 513). Performance results are given and they are compared to an Out-of-Place Transposition algorithm as well as ACM Algorithm 467.