Finding neighbors of equal size in linear quadtrees and octrees in constant time
CVGIP: Image Understanding
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Recursive array layouts and fast parallel matrix multiplication
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
STL tutorial and reference guide, second edition: C++ programming with the standard template library
STL tutorial and reference guide, second edition: C++ programming with the standard template library
Language support for Morton-order matrices
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Recursive Array Layouts and Fast Matrix Multiplication
IEEE Transactions on Parallel and Distributed Systems
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Tiling, Block Data Layout, and Memory Hierarchy Performance
IEEE Transactions on Parallel and Distributed Systems
Fast additions on masked integers
ACM SIGPLAN Notices
Is Morton layout competitive for large two-dimensional arrays yet?: Research Articles
Concurrency and Computation: Practice & Experience - 10th International Workshop on Compilers for Parallel Computers (CPC 2003)
Analyzing block locality in Morton-order and Morton-hybrid matrices
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms
Proceedings of the 2006 workshop on Memory system performance and correctness
Families of algorithms related to the inversion of a Symmetric Positive Definite matrix
ACM Transactions on Mathematical Software (TOMS)
Design for Interoperability in stapl: pMatrices and Linear Algebra Algorithms
Languages and Compilers for Parallel Computing
C++ Bindings to External Software Libraries with Examples from BLAS, LAPACK, UMFPACK, and MUMPS
ACM Transactions on Mathematical Software (TOMS)
Generic compressed sparse matrix insertion: algorithms and implementations in MTL4 and FEniCS
Proceedings of the 8th workshop on Parallel/High-Performance Object-Oriented Scientific Computing
Static reuse distances for locality-based optimizations in MATLAB
Proceedings of the 24th ACM International Conference on Supercomputing
The STAPL parallel container framework
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Statically typed matrix: in C++ library
Proceedings of the Fifth Balkan Conference in Informatics
A high-level Fortran interface to parallel matrix algebra
Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
Hi-index | 0.00 |
Positive results from new object-oriented tools for scientific programming are reported. Using template classes, abstractions of matrix representations are available that subsume conventional row-major, column-major, either Z- or И-Morton-order, as well as block-wise combinations of these. Moreover, the design of the Matrix Template Library (MTL) has been independently extended to provide recursators, to support block-recursive algorithms, supplementing MTL's iterators. Data types modeling both concepts enable the programmer to implement both iterative and recursive algorithms (or even both) on all of the aforementioned matrix representations at once for a wide family of important scientific operations. We illustrate the unrestricted applicability of our matrix-recursator on matrix multiplication. The same generic block-recursive function, unaltered, is instantiated on different triplets of matrix types. Within a base block, either a library multiplication or a user-provided, platform-specific code provides excellent performance. We achieve 77% of peak-performance using hand-tuned base cases without explicit prefetching. This excellent performance becomes available over a wide family of matrix representations from a single program. The techniques generalize to other applications in linear algebra.