An Efficient Algorithm for Large-Scale Matrix Transposition

  • Authors:
  • Jinwoo Suh;Viktor K. Prasanna

  • Affiliations:
  • -;-

  • Venue:
  • ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Efficient transposition of large-scale matrices has been widely studied. These efforts have focused on reducing the number of I/O operations. However, in the state-of-the-art architectures, data transfer time and index computation time are also significant components of the overall time. In this paper, we propose an algorithm that considers all these costs and reduces the over all execution time. The reduction of the overall execution time is achieved by using two techniques: (1) writing the data onto disk in predefined patterns and (2) balancing the numbers of disk read and write operations. Even though our approach may increase, the number of I/O operations for some cases it results in an overall reduction in the execution time. The index computation time, which is an expensive operation involving two divisions and a multiplication, is eliminated by partitioning the memory into two buffers. The expensive in-processor permutation is replaced by data collection operations. Our algorithm is analyzed using the well-known Linear Model and the Parallel Disk Model. The experimental results on a Sun Enterprise and a DEC Alpha show that our algorithm reduces the execution time by about 50%, compared with the best known algorithms in the literature.