Efficient reduction from block hessenberg form to hessenberg form using shared memory

  • Authors:
  • Lars Karlsson;Bo Kågström

  • Affiliations:
  • Department of Computing Science and HPC2N, Umeå University, Umeå, Sweden;Department of Computing Science and HPC2N, Umeå University, Umeå, Sweden

  • Venue:
  • PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

A new cache-efficient algorithm for reduction from block Hessenberg form to Hessenberg form is presented and evaluated. The algorithm targets parallel computers with shared memory. One level of look-ahead in combination with a dynamic load-balancing scheme significantly reduces the idle time and allows the use of coarse-grained tasks. The coarse tasks lead to high-performance computations on each processor/core. Speedups close to 13 over the sequential unblocked algorithm have been observed on a dual quad-core machine using one thread per core.