An Efficient Algorithm for Large-Scale Matrix Transposition

Authors:
Jinwoo Suh;Viktor K. Prasanna
Affiliations:
-;-
Venue:
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Year:
2000

Citing 15
Cited 0

The input/output complexity of sorting and related problems

Communications of the ACM
Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
Efficient transposition algorithms for large matrices

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
RAID: high-performance, reliable secondary storage

ACM Computing Surveys (CSUR)
Optimal read-once parallel disk scheduling

Proceedings of the sixth workshop on I/O in parallel and distributed systems
Multidimensional Digital Signal Processing

Multidimensional Digital Signal Processing
Hierarchical tiling for improved superscalar performance

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Portable Implementation of Real-Time Signal Processing Benchmarks on HPC Platforms

PARA '98 Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems
An Improved Parallel Disk Scheduling Algorithm

HIPC '98 Proceedings of the Fifth International Conference on High Performance Computing
Data Management for Large-Scale Scientific Computations in High Performance Distributed Systems

HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
Design, Implementation and Evaluation of Parallel Pipelined STAP on Parallel Computers

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
On Transposing Large 2nx 2nMatrices

IEEE Transactions on Computers
A Fast Computer Method for Matrix Transposing

IEEE Transactions on Computers
A Method for Transposing Externally Stored Matrices

IEEE Transactions on Computers
A Generalization of Eklundh's Algorithm for Transposing Large Matrices

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficient transposition of large-scale matrices has been widely studied. These efforts have focused on reducing the number of I/O operations. However, in the state-of-the-art architectures, data transfer time and index computation time are also significant components of the overall time. In this paper, we propose an algorithm that considers all these costs and reduces the over all execution time. The reduction of the overall execution time is achieved by using two techniques: (1) writing the data onto disk in predefined patterns and (2) balancing the numbers of disk read and write operations. Even though our approach may increase, the number of I/O operations for some cases it results in an overall reduction in the execution time. The index computation time, which is an expensive operation involving two divisions and a multiplication, is eliminated by partitioning the memory into two buffers. The expensive in-processor permutation is replaced by data collection operations. Our algorithm is analyzed using the well-known Linear Model and the Parallel Disk Model. The experimental results on a Sun Enterprise and a DEC Alpha show that our algorithm reduces the execution time by about 50%, compared with the best known algorithms in the literature.