Parallel two-stage reduction to Hessenberg form using dynamic scheduling on shared-memory architectures

Authors:
L. Karlsson;B. Kågström
Affiliations:
Department of Computing Science and HPC2N, Umeå University, SE-901 87, Umeå, Sweden;Department of Computing Science and HPC2N, Umeå University, SE-901 87, Umeå, Sweden
Venue:
Parallel Computing
Year:
2011

Citing 15
Cited 2

A storage-efficient WY representation for products of householder transformations

SIAM Journal on Scientific and Statistical Computing
A parallel algorithm for the reduction of a nonsymmetric matrix to block upper-Hessenberg form

Parallel Computing
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
Blocked algorithms and software for reduction of a regular matrix pair to generalized Schur form

ACM Transactions on Mathematical Software (TOMS)
A framework for symmetric band reduction

ACM Transactions on Mathematical Software (TOMS)
The Multishift QR Algorithm. Part I: Maintaining Well-Focused Shifts and Level 3 Performance

SIAM Journal on Matrix Analysis and Applications
The Design of a Parallel Dense Linear Algebra Software Library: Reduction to Hessenberg, Trididgonal, and Bidiagonal Form

The Design of a Parallel Dense Linear Algebra Software Library: Reduction to Hessenberg, Trididgonal, and Bidiagonal Form
Improving the performance of reduction to Hessenberg form

ACM Transactions on Mathematical Software (TOMS)
Applying recursion to serial and parallel QR factorization leads to better performance

IBM Journal of Research and Development
Scheduling two-sided transformations using tile algorithms on multicore architectures

Scientific Programming
Parallel Solvers for Sylvester-Type Matrix Equations with Applications in Condition Estimation, Part I: Theory and Algorithms

ACM Transactions on Mathematical Software (TOMS)
Algorithm 904: The SCASY Library—Parallel Solvers for Sylvester-Type Matrix Equations with Applications in Condition Estimation, Part II

ACM Transactions on Mathematical Software (TOMS)
Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing

Parallel Computing
Reduction to condensed forms for symmetric eigenvalue problems on multi-core architectures

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
A Novel Parallel QR Algorithm for Hybrid Distributed Memory HPC Systems

SIAM Journal on Scientific Computing

Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication

Proceedings of the 27th international ACM conference on International conference on supercomputing
Accelerated implementation of adaptive directional lifting-based discrete wavelet transform on GPU

Image Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider parallel reduction of a real matrix to Hessenberg form using orthogonal transformations. Standard Hessenberg reduction algorithms reduce the columns of the matrix from left to right in either a blocked or unblocked fashion. However, the standard blocked variant performs 20% of the computations in terms of matrix-vector multiplications. We show that a two-stage approach consisting of an intermediate reduction to block Hessenberg form speeds up the reduction by avoiding matrix-vector multiplications. We describe and evaluate a new high-performance implementation of the two-stage approach that attains significant speedups over the one-stage approach. The key components are a dynamically scheduled implementation of Stage 1 and a blocked, adaptively load-balanced implementation of Stage 2.