Adapting linear algebra codes to the memory hierarchy using a hypermatrix scheme

Authors:
José R. Herrero;Juan J. Navarro
Affiliations:
Computer Architecture Dept., Univ. Politècnica de Catalunya, Barcelona, Spain;Computer Architecture Dept., Univ. Politècnica de Catalunya, Barcelona, Spain
Venue:
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Year:
2005

Citing 7
Cited 2

The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
MOB forms: a class of multilevel block algorithms for dense linear algebra operations

ICS '94 Proceedings of the 8th international conference on Supercomputing
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving memory hierarchy performance for irregular applications

ICS '99 Proceedings of the 13th international conference on Supercomputing
Nonlinear array layouts for hierarchical memory systems

ICS '99 Proceedings of the 13th international conference on Supercomputing
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Ahnentafel Indexing into Morton-Ordered Arrays, or Matrix Locality for Free

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing

Using non-canonical array layouts in dense matrix operations

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Compiler-optimized kernels: an efficient alternative to hand-coded inner kernels

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present the way in which we adapt data and computations to the underlying memory hierarchy by means of a hierarchical data structure known as hypermatrix. The application of orthogonal block forms produced the best performance for the platforms used.