Cache blocking

Authors:
Fred G. Gustavson
Affiliations:
IBM T.J. Watson Research Center, Emeritus, Poland
Venue:
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I
Year:
2010

Citing 19
Cited 1

New scalar and vector elementary functions for the IBM system/370

IBM Journal of Research and Development
A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms

IBM Journal of Research and Development
A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication

IBM Journal of Research and Development
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
ScaLAPACK user's guide

ScaLAPACK user's guide
Recursion leads to automatic variable blocking for dense linear-algebra algorithms

IBM Journal of Research and Development
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
Tiling, Block Data Layout, and Memory Hierarchy Performance

IEEE Transactions on Parallel and Distributed Systems
High-performance linear algebra algorithms using new generalized data structures for matrices

IBM Journal of Research and Development
A fully portable high performance minimal storage hybrid format Cholesky algorithm

ACM Transactions on Mathematical Software (TOMS)
Implementation of mixed precision in solving systems of linear equations on the Cell processor: Research Articles

Concurrency and Computation: Practice & Experience
Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization

IEEE Transactions on Parallel and Distributed Systems
A class of parallel tiled linear algebra algorithms for multicore architectures

Parallel Computing
Design and exploitation of a high-performance SIMD floating-point unit for Blue Gene/L

IBM Journal of Research and Development
Minimal data copy for dense linear algebra factorization

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
In-place transposition of rectangular matrices

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
The relevance of new data structure approaches for dense linear algebra in the new multi-core/many core environments

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion

ACM Transactions on Mathematical Software (TOMS)

New level-3 BLAS kernels for cholesky factorization

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Over the past five years almost all computer manufacturers have dramatically changed their computer architectures to Multicore (MC) processors. We briefly describe Cache Blocking as it relates to computer architectures since about 1985 by covering the where, when, how and why of Cache Blocking as it relates to dense linear algebra. It will be seen that the arrangement in memory of the submatrices Aij of A that are being processed is very important.