OpenMP issues arising in the development of parallel BLAS and LAPACK libraries

Authors:
C. Addison_c;Y. Ren;M. van Waveren
Affiliations:
Department of Computer Science, University of Manchester, Manchester (Correspd. 66 Queens Avenue, Meols, Wirral, CH47 0NA, UK. Tel: +44 151 632 6615/ E-mail: caddison@addis0.fsnet.co.uk);Fujitsu European Centre for Information Technology, Hayes, UK;Fujitsu Systems Europe, Toulouse, France
Venue:
Scientific Programming - OpenMP
Year:
2003

Citing 6
Cited 5

The high performance Fortran handbook

The high performance Fortran handbook
ScaLAPACK user's guide

ScaLAPACK user's guide
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
Is data distribution necessary in OpenMP?

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Extending OpenMP for NUMA machines

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Automatically Tuned Linear Algebra Software

Automatically Tuned Linear Algebra Software

Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Programming matrix algorithms-by-blocks for thread-level parallelism

ACM Transactions on Mathematical Software (TOMS)
Managing the complexity of lookahead for LU factorization with pivoting

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Toward scalable matrix multiply on multithreaded architectures

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dense linear algebra libraries need to cope efficiently with a range of input problem sizes and shapes. Inherently this means that parallel implementations have to exploit parallelism wherever it is present. While OpenMP allows relatively fine grain parallelism to be exploited in a shared memory environment it currently lacks features to make it easy to partition computation over multiple array indices or to overlap sequential and parallel computations. The inherent flexible nature of shared memory paradigms such as OpenMP poses other difficulties when it becomes necessary to optimise performance across successive parallel library calls. Notions borrowed from distributed memory paradigms, such as explicit data distributions help address some of these problems, but the focus on data rather than work distribution appears misplaced in an SMP context.