Efficient Adaptive Algorithms for Transposing Small and Large Matrices on Symmetric Multiprocessors

Authors:
Rami Al Na'mneh;W. David Pan;Seong-Moo Yoo
Affiliations:
Department of Electrical and Computer Engineering, University of Alabama in Huntsville, 301 Sparkman Drive, Huntsville, Alabama 35899, USA, e-mail: dwpan@ece.uah.edu;Department of Electrical and Computer Engineering, University of Alabama in Huntsville, 301 Sparkman Drive, Huntsville, Alabama 35899, USA, e-mail: dwpan@ece.uah.edu;Department of Electrical and Computer Engineering, University of Alabama in Huntsville, 301 Sparkman Drive, Huntsville, Alabama 35899, USA, e-mail: dwpan@ece.uah.edu
Venue:
Informatica
Year:
2006

Citing 17
Cited 0

Imaging the earth's interior

Imaging the earth's interior
FFTs in external or hierarchical memory

The Journal of Supercomputing
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers

IEEE Transactions on Parallel and Distributed Systems
Using MPI: portable parallel programming with the message-passing interface

Using MPI: portable parallel programming with the message-passing interface
Parallel matrix transpose algorithms on distributed memory concurrent computers

Parallel Computing
Matrix transpose for block allocations on torus and de Bruijn networks

Journal of Parallel and Distributed Computing
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems

IEEE Transactions on Parallel and Distributed Systems
Optimization of MPI collectives on clusters of large-scale SMP's

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
All-to-All Personalized Communication in Multidimensional Torus and Mesh Networks

IEEE Transactions on Parallel and Distributed Systems
A comparison of optimal FFTs on torus and hypercube multicomputers

Parallel Computing
Spotlight-Mode Synthetic Aperture Radar: A Signal Processing Approach

Spotlight-Mode Synthetic Aperture Radar: A Signal Processing Approach
The Scalability of FFT on Parallel Computers

IEEE Transactions on Parallel and Distributed Systems
A Comparative Characterization of Communication Patterns in Applications Using MPI and Shared Memory on an IBM SP2

CANPC '98 Proceedings of the Second International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
Improved MPI All-to-all Communication on a Giganet SMP Cluster

Proceedings of the 9th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Message passing and shared address space parallelism on an SMP cluster

Parallel Computing
The Performance Advantages of Integrating Message Passing in Cache-Coherent Multiprocessors

The Performance Advantages of Integrating Message Passing in Cache-Coherent Multiprocessors
An efficient parallel-processing method for transposing large matrices in place

IEEE Transactions on Image Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Matrix transpose in parallel systems typically involves costly all-to-all communications. In this paper, we provide a comparative characterization of various efficient algorithms for transposing small and large matrices using the popular symmetric multiprocessors (SMP) architecture, which carries a relatively low communication cost due to its large aggregate bandwidth and low-latency inter-process communication. We conduct analysis on the cost of data sending / receiving and the memory requirement of these matrix-transpose algorithms. We then propose an adaptive algorithm that can minimize the overhead of the matrix transpose operations given the parameters such as the data size, number of processors, start-up time, and the effective communication bandwidth.