The Matrix Template Library: A Generic Programming Approach to High Performance Numerical Linear Algebra

Authors:
Jeremy G. Siek;Andrew Lumsdaine
Affiliations:
-;-
Venue:
ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
Year:
1998

Citing 10
Cited 14

Algorithm 656: an extended set of basic linear algebra subprograms: model implementation and test programs

ACM Transactions on Mathematical Software (TOMS)
A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
LAPACK: a portable linear algebra library for high-performance computers

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Advanced compiler design and implementation

Advanced compiler design and implementation
Basic Linear Algebra Subprograms for Fortran Usage

ACM Transactions on Mathematical Software (TOMS)
Will C++ Be Faster than Fortran?

ISCOPE '97 Proceedings of the Scientific Computing in Object-Oriented Parallel Environments
The Role of Abstraction in High-Performance Computing

ISCOPE '97 Proceedings of the Scientific Computing in Object-Oriented Parallel Environments
Optimizing Matrix Multiply using PHiPAC: a Portable,High-Performance, ANSI C Coding Methodology

Optimizing Matrix Multiply using PHiPAC: a Portable,High-Performance, ANSI C Coding Methodology
Automatically Tuned Linear Algebra Software

Automatically Tuned Linear Algebra Software

A Generic C++ Framework for Parallel Mesh-Based Scientific Applications

HIPS '01 Proceedings of the 6th International Workshop on High-Level Parallel Programming Models and Supportive Environments
Concept-Based Component Libraries and Optimizing Compilers

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
On Materializations of Array-Valued Temporaries

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Code Generators for Automatic Tuning of Numerical Kernels: Experiences with FFTW

SAIG '00 Proceedings of the International Workshop on Semantics, Applications, and Implementation of Program Generation
Concept Use or Concept Refinement: An Important Distinction in Building Generic Specifications

ICFEM '02 Proceedings of the 4th International Conference on Formal Engineering Methods: Formal Methods and Software Engineering
Delayed Evaluation, Self-optimising Software Components as a Programming Model

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
User-Extensible SimplificationType-Based Optimizer Generators

CC '01 Proceedings of the 10th International Conference on Compiler Construction
An Environment for Building Customizable Software Components

CD '02 Proceedings of the IFIP/ACM Working Conference on Component Deployment
A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers

Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
Algorithm engineering: bridging the gap between algorithm theory and practice

Algorithm engineering: bridging the gap between algorithm theory and practice
A generative geometric kernel

Proceedings of the 20th ACM SIGPLAN workshop on Partial evaluation and program manipulation
DESOLA: An active linear algebra library using delayed evaluation and runtime code generation

Science of Computer Programming
Efficient run-time dispatching in generic programming with minimal code bloat

Science of Computer Programming
A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures

ACM Transactions on Mathematical Software (TOMS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a unified approach for building high-performance numerical linear algebra routines for large classes of dense and sparse matrices. As with the Standard Template Library [1], we separate algorithms from data structures using generic programming techniques. Such an approach does not hinder high performance; rather, writing portable high-performance codes is enabled because the performance-critical code can be isolated from the algorithms and data structures. We address the performance portability problem for architecture-dependent algorithms such as matrix-matrix multiply. Recently, code generation systems, such as PHiPAC [2] and ATLAS [3], have allowed algorithms to be tuned to particular architectures. Our approach is to use template metaprograms [4] to directly express performance-critical, architecture-dependent, sections of code.