Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Recursion leads to automatic variable blocking for dense linear-algebra algorithms
IBM Journal of Research and Development
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Analyzing block locality in Morton-order and Morton-hybrid matrices
ACM SIGARCH Computer Architecture News
Cache oblivious matrix operations using Peano curves
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Hardware-oriented implementation of cache oblivious matrix operations based on space-filling curves
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Cache-oblivious polygon indecomposability testing
Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
Hi-index | 0.00 |
Cache oblivious algorithms are algorithms that are designed to inherently exploit any kind of cache memory—regardless of its size or architecture. In this article, we discuss a cache oblivious algorithm for matrix multiplication. The elements of the matrices are stored according to a Peano space filling curve. A block recursive approach then leads to an algorithm where memory access to matrix elements is strictly local. Consequently, the algorithm shows several interesting properties considering cache performance, prefetching strategies, or even parallelization.