Representation-transparent matrix algorithms with scalable performance

Authors:
Peter Gottschling;David S. Wise;Michael D. Adams
Affiliations:
Indiana University Bloomington, IN;Bloomington, IN;Indiana University, Bloomington, IN
Venue:
Proceedings of the 21st annual international conference on Supercomputing
Year:
2007

Citing 13
Cited 7

Finding neighbors of equal size in linear quadtrees and octrees in constant time

CVGIP: Image Understanding
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Recursive array layouts and fast parallel matrix multiplication

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
STL tutorial and reference guide, second edition: C++ programming with the standard template library

STL tutorial and reference guide, second edition: C++ programming with the standard template library
Language support for Morton-order matrices

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Recursive Array Layouts and Fast Matrix Multiplication

IEEE Transactions on Parallel and Distributed Systems
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Tiling, Block Data Layout, and Memory Hierarchy Performance

IEEE Transactions on Parallel and Distributed Systems
Fast additions on masked integers

ACM SIGPLAN Notices
Is Morton layout competitive for large two-dimensional arrays yet?: Research Articles

Concurrency and Computation: Practice & Experience - 10th International Workshop on Compilers for Parallel Computers (CPC 2003)
Analyzing block locality in Morton-order and Morton-hybrid matrices

MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms

Proceedings of the 2006 workshop on Memory system performance and correctness
Families of algorithms related to the inversion of a Symmetric Positive Definite matrix

ACM Transactions on Mathematical Software (TOMS)

Design for Interoperability in stapl: pMatrices and Linear Algebra Algorithms

Languages and Compilers for Parallel Computing
C++ Bindings to External Software Libraries with Examples from BLAS, LAPACK, UMFPACK, and MUMPS

ACM Transactions on Mathematical Software (TOMS)
Generic compressed sparse matrix insertion: algorithms and implementations in MTL4 and FEniCS

Proceedings of the 8th workshop on Parallel/High-Performance Object-Oriented Scientific Computing
Static reuse distances for locality-based optimizations in MATLAB

Proceedings of the 24th ACM International Conference on Supercomputing
The STAPL parallel container framework

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Statically typed matrix: in C++ library

Proceedings of the Fifth Balkan Conference in Informatics
A high-level Fortran interface to parallel matrix algebra

Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Positive results from new object-oriented tools for scientific programming are reported. Using template classes, abstractions of matrix representations are available that subsume conventional row-major, column-major, either Z- or И-Morton-order, as well as block-wise combinations of these. Moreover, the design of the Matrix Template Library (MTL) has been independently extended to provide recursators, to support block-recursive algorithms, supplementing MTL's iterators. Data types modeling both concepts enable the programmer to implement both iterative and recursive algorithms (or even both) on all of the aforementioned matrix representations at once for a wide family of important scientific operations. We illustrate the unrestricted applicability of our matrix-recursator on matrix multiplication. The same generic block-recursive function, unaltered, is instantiated on different triplets of matrix types. Within a base block, either a library multiplication or a user-provided, platform-specific code provides excellent performance. We achieve 77% of peak-performance using hand-tuned base cases without explicit prefetching. This excellent performance becomes available over a wide family of matrix representations from a single program. The techniques generalize to other applications in linear algebra.