The relevance of new data structure approaches for dense linear algebra in the new multi-core/many core environments

Authors:
Fred G. Gustavson
Affiliations:
IBM T.J. Watson Research Center, Yorktown Heights, NY
Venue:
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Year:
2007

Citing 14
Cited 2

A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms

IBM Journal of Research and Development
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Recursion leads to automatic variable blocking for dense linear-algebra algorithms

IBM Journal of Research and Development
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
A recursive formulation of Cholesky factorization of a matrix in packed storage

ACM Transactions on Mathematical Software (TOMS)
New Generalized Data Structures for Matrices Lead to a Variety of High Performance Algorithms

PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
Tiling, Block Data Layout, and Memory Hierarchy Performance

IEEE Transactions on Parallel and Distributed Systems
High-performance linear algebra algorithms using new generalized data structures for matrices

IBM Journal of Research and Development
A fully portable high performance minimal storage hybrid format Cholesky algorithm

ACM Transactions on Mathematical Software (TOMS)
Minimal-storage high-performance Cholesky factorization via blocking and recursion

IBM Journal of Research and Development
Design and exploitation of a high-performance SIMD floating-point unit for Blue Gene/L

IBM Journal of Research and Development
Minimal data copy for dense linear algebra factorization

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
New generalized data structures for matrices lead to a variety of high performance dense linear algebra algorithms

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing

Cache blocking

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I
Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms

ACM Transactions on Mathematical Software (TOMS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

For about ten years now, Bo Kågström's Group in Umea, Sweden, Jerzy Wasniewski's Team at Danish Technical University in Lyngby, Denmark, and I at IBM Research in Yorktown Heights have been applying recursion and New Data Structures (NDS) to increase the performance of Dense Linear Algebra (DLA) factorization algorithms. Later, John Gunnels, and later still, Jim Sexton, both now at IBM Research also began working in this area. For about three years now almost all computer manufacturers have dramatically changed their computer architectures which they call Multi-Core, (MC). It turns out that these new designs give poor performance for the traditional designs of DLA libraries such as LAPACK and ScaLAPACK. Recent results of Jack Dongarra's group at the Innovative Computing Laboratory in Knoxville, Tennessee have shown how to obtain high performance for DLA factorization algorithms on the Cell architecture, an example of an MC processor, but only when they used NDS. In this talk we will give some reasons why this is so.