Performance modeling and optimal block size selection for the small-bulge multishift QR algorithm

Authors:
Yusaku Yamamoto
Affiliations:
Nagoya University, Nagoya, Aichi, Japan
Venue:
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Year:
2006

Citing 13
Cited 1

On a block implementation of Hessenberg multishift QR iteration

International Journal of High Speed Computing
LAPACK's user's guide

LAPACK's user's guide
Bidirectional chasing algorithms for the eigenvalue problem

SIAM Journal on Matrix Analysis and Applications
Shifting strategies for the parallel QR algorithm

SIAM Journal on Scientific Computing
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
Applied numerical linear algebra

Applied numerical linear algebra
A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures

SIAM Journal on Scientific Computing
The Multishift QR Algorithm. Part I: Maintaining Well-Focused Shifts and Level 3 Performance

SIAM Journal on Matrix Analysis and Applications
A Hierarchical Approach for Performance Analysis of ScaLAPACK-Based Routines Using the Distributed Linear Algebra Machine

PARA '96 Proceedings of the Third International Workshop on Applied Parallel Computing, Industrial Computation and Optimization
Architecture of an automatically tuned linear algebra library

Parallel Computing
Performance Modeling and Optimal Block Size Selection for a BLAS-3 Based Tridiagonalization Algorithm

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Self-adapting numerical software (SANS) effort

IBM Journal of Research and Development

A Novel Parallel QR Algorithm for Hybrid Distributed Memory HPC Systems

SIAM Journal on Scientific Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The small-bulge multishift QR algorithm proposed by Braman, Byers and Mathias is one of the most efficient algorithms for computing the eigenvalues of nonsymmetric matrices on processors with hierarchical memory. However, to fully extract its potential performance, it is crucial to choose the block size m properly according to the target architecture and the matrix size n. In this paper, we construct a performance model for this algorithm. The model has a hierarchical structure that reflects the structure of the original algorithm and given n, m and the performance data of the basic components of the algorithm, such as the level-3 BLAS routines and the double implicit shift QR routine, predicts the total execution time. Experiments on SMP machines with PowerPC G5 and Opteron processors show that the variation of the execution time as a function of m predicted by the model agrees well with the measurements. Thus our model can be used to automatically select the optimal value of m for a given matrix size on a given architecture.