A matrix-type for performance–portability

Authors:
N. Peter Drakenberg
Affiliations:
Department of Microelectronics and Information Technology, The Royal Institute of Technology, Stockholm, Sweden
Venue:
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Year:
2004

Citing 19
Cited 1

An extended set of FORTRAN basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
A storage-efficient WY representation for products of householder transformations

SIAM Journal on Scientific and Statistical Computing
A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Compiler blockability of numerical algorithms

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Improving performance of linear algebra algorithms for dense matrices, using algorithmic prefetch

IBM Journal of Research and Development
Simplifying polynomial constraints over integers to make dependence analysis more precise

Simplifying polynomial constraints over integers to make dependence analysis more precise
Advanced Array Optimizations for High Performance Functional Languages

IEEE Transactions on Parallel and Distributed Systems
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Combining loop transformations considering caches and scheduling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Fuzzy array dataflow analysis

Journal of Parallel and Distributed Computing
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Compiler blockability of dense matrix factorizations

ACM Transactions on Mathematical Software (TOMS)
Recursion leads to automatic variable blocking for dense linear-algebra algorithms

IBM Journal of Research and Development
Basic Linear Algebra Subprograms for Fortran Usage

ACM Transactions on Mathematical Software (TOMS)
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
FLAME: Formal Linear Algebra Methods Environment

ACM Transactions on Mathematical Software (TOMS)
The Definition of Standard ML

The Definition of Standard ML
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing

High performance linear algebra algorithms: an introduction

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

When matrix computations are expressed in conventional programming languages, matrices are almost exclusively represented by arrays, but arrays are also used to represent many other kinds of entities, such as grids, lists, hash tables, etc. The responsibility for achieving efficient matrix computations is usually seen as resting on compilers, which in turn apply loop restructuring and reordering transformations to adapt programs and program fragments to target different architectures. Unfortunately, compilers are often unable to restructure conventional algorithms for matrix computations into their block or block-recursive counterparts, which are required to obtain acceptable levels of perfomance on most current (and future) hardware systems. We present a datatype which is dedicated to the representation of dense matrices. In contrast to arrays, for which index-based element-reference is the basic (primitive) operation, the primitive operations of our specialized matrix-type are composition and decomposition of/into submatrices. Decomposition of a matrix into submatrices (of unspecified sizes) is a key operation in the development of block algorithms for matrix computations, and through direct and explicit expression of (ambiguous) decompositions of matrices into submatrices, block algorithms can be expressed explicitly and at the same time the task of finding good decomposition parameters (i.e., block sizes) for each specific target system, is exposed to and made suitable for compilers.