MOB forms: a class of multilevel block algorithms for dense linear algebra operations

Authors:
Juan J. Navarro;Toni Juan;Tomás Lang
Affiliations:
Computer Architecture Department, Universitat Politecnica de Catalunya, Gran Capita s/n, Modul D6, E-08034 Barcelona, Spain;Computer Architecture Department, Universitat Politecnica de Catalunya, Gran Capita s/n, Modul D6, E-08034 Barcelona, Spain;Department of Electrical and Computer Engineering, University of California at Irvine
Venue:
ICS '94 Proceedings of the 8th international conference on Supercomputing
Year:
1994

Citing 11
Cited 16

Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Parallel algorithms for dense linear algebra computations

SIAM Review
IBM RISC System/6000 processor architecture

IBM Journal of Research and Development
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Hierarchical blocking and data flow analysis for numerical linear algebra

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
LAPACK's user's guide

LAPACK's user's guide
Compiler blockability of numerical algorithms

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Organizing matrices and matrix operations for paged memory systems

Communications of the ACM
Performance Features of the PA7100 Microprocessor

IEEE Micro
Iteration Space Tiling for Memory Hierarchies

Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing

Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data prefetching and multilevel blocking for linear algebra operations

ICS '96 Proceedings of the 10th international conference on Supercomputing
Block algorithms for sparse matrix computations on high performance workstations

ICS '96 Proceedings of the 10th international conference on Supercomputing
A general algorithm for tiling the register level

ICS '98 Proceedings of the 12th international conference on Supercomputing
Cache miss equations: a compiler framework for analyzing and tuning memory behavior

ACM Transactions on Programming Languages and Systems (TOPLAS)
Loop optimization for a class of memory-constrained computations

ICS '01 Proceedings of the 15th international conference on Supercomputing
Register tiling in nonrectangular iteration spaces

ACM Transactions on Programming Languages and Systems (TOPLAS)
Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

IEEE Transactions on Computers
On increasing architecture awareness in program optimizations to bridge the gap between peak and sustained processor performance: matrix-multiply revisited

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A comparison of empirical and model-driven optimization

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
A Geometric Programming Framework for Optimal Multi-Level Tiling

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Hypergraph partitioning for automatic memory hierarchy management

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Using non-canonical array layouts in dense matrix operations

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
New data structures for matrices and specialized inner kernels: low overhead for high performance

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Adapting linear algebra codes to the memory hierarchy using a hypermatrix scheme

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Compiler-optimized kernels: an efficient alternative to hand-coded inner kernels

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multilevel block algorithms exploit the data locality in linear algebra operations when executed in machines with several levels in the memory hierarchy. It is shown that the family we call Multilevel Orthogonal Block (MOB) algorithms is optimal and easy to design and that using the multilevel approach produces significant performance improvements. The effect of interference in the cache, of the TLB misses, and of page faults are also considered. The multilevel block algorithms are evaluated analytically for an ideal memory system with M cache levels without interferences. Moreover, experimental results of the MOB forms in some present high performance workstations are presented.