Finding neighbors of equal size in linear quadtrees and octrees in constant time
CVGIP: Image Understanding
Language support for Morton-order matrices
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Recursive Array Layouts and Fast Matrix Multiplication
IEEE Transactions on Parallel and Distributed Systems
Ahnentafel Indexing into Morton-Ordered Arrays, or Matrix Locality for Free
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Matrix factorization using a block-recursive structure and block-recursive algorithms
Matrix factorization using a block-recursive structure and block-recursive algorithms
Is Morton layout competitive for large two-dimensional arrays yet?: Research Articles
Concurrency and Computation: Practice & Experience - 10th International Workshop on Compilers for Parallel Computers (CPC 2003)
Analyzing block locality in Morton-order and Morton-hybrid matrices
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms
Proceedings of the 2006 workshop on Memory system performance and correctness
Analyzing block locality in Morton-order and Morton-hybrid matrices
ACM SIGARCH Computer Architecture News
Cache-oblivious polygon indecomposability testing
Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
Hi-index | 0.00 |
A style for programming problems from matrix algebra is developed with a familiar example and new tools, yielding high performance with a couple of surprising exceptions. The underlying philosophy is to use block recursion as the exclusive control structure, down to a 2p× 2p base case anyway, where hardware favors iterative style to fill its pipe. Use of Morton-ordered matrices yields excellent locality within the memory hierarchy—including block sharing among distributed computers. The recursion generalizes nicely to an SPMD program where such sharing is the only communication. Cholesky factorization of an n × n SPD matrix is used as a simple nontrivial example to expose the paradigm. The program amounts to four functions, two of which are finalizers for the other two. This insight allows final blocks to be shared with inter-node communication ∈ Θ(n2) for this algorithm ∈ Θ (n3) flops.