An efficient parallel-processing method for transposing large matrices in place

  • Authors:
  • M. R. Portnoff

  • Affiliations:
  • Lawrence Livermore Nat. Lab., CA

  • Venue:
  • IEEE Transactions on Image Processing
  • Year:
  • 1999

Quantified Score

Hi-index 0.01

Visualization

Abstract

We have developed an efficient algorithm for transposing large matrices in place. The algorithm is efficient because data are accessed either sequentially in blocks or randomly within blocks small enough to fit in cache, and because the same indexing calculations are shared among identical procedures operating on independent subsets of the data. This inherent parallelism makes the method well suited for a multiprocessor computing environment. The algorithm is easy to implement because the same two procedures are applied to the data in various groupings to carry out the complete transpose operation. Using only a single processor, we have demonstrated nearly an order of magnitude increase in speed over the previously published algorithm by Gate and Twigg (1977) for transposing a large rectangular matrix in place. With multiple processors operating in parallel, the processing speed increases almost linearly with the number of processors. A simplified version of the algorithm for square matrices is presented as well as an extension for matrices large enough to require virtual memory