Parallel two-stage reduction to Hessenberg form using dynamic scheduling on shared-memory architectures

  • Authors:
  • L. Karlsson;B. Kågström

  • Affiliations:
  • Department of Computing Science and HPC2N, Umeå University, SE-901 87, Umeå, Sweden;Department of Computing Science and HPC2N, Umeå University, SE-901 87, Umeå, Sweden

  • Venue:
  • Parallel Computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider parallel reduction of a real matrix to Hessenberg form using orthogonal transformations. Standard Hessenberg reduction algorithms reduce the columns of the matrix from left to right in either a blocked or unblocked fashion. However, the standard blocked variant performs 20% of the computations in terms of matrix-vector multiplications. We show that a two-stage approach consisting of an intermediate reduction to block Hessenberg form speeds up the reduction by avoiding matrix-vector multiplications. We describe and evaluate a new high-performance implementation of the two-stage approach that attains significant speedups over the one-stage approach. The key components are a dynamically scheduled implementation of Stage 1 and a blocked, adaptively load-balanced implementation of Stage 2.