A paradigm for parallel matrix algorithms: scalable cholesky

Authors:
David S. Wise;Craig Citro;Joshua Hursey;Fang Liu;Michael Rainey
Affiliations:
Indiana University, Bloomington;Indiana University, Bloomington;Indiana University, Bloomington;Indiana University, Bloomington;Indiana University, Bloomington
Venue:
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Year:
2005

Citing 8
Cited 4

Finding neighbors of equal size in linear quadtrees and octrees in constant time

CVGIP: Image Understanding
Language support for Morton-order matrices

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Recursive Array Layouts and Fast Matrix Multiplication

IEEE Transactions on Parallel and Distributed Systems
Ahnentafel Indexing into Morton-Ordered Arrays, or Matrix Locality for Free

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Matrix factorization using a block-recursive structure and block-recursive algorithms

Matrix factorization using a block-recursive structure and block-recursive algorithms
Is Morton layout competitive for large two-dimensional arrays yet?: Research Articles

Concurrency and Computation: Practice & Experience - 10th International Workshop on Compilers for Parallel Computers (CPC 2003)

Analyzing block locality in Morton-order and Morton-hybrid matrices

MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms

Proceedings of the 2006 workshop on Memory system performance and correctness
Analyzing block locality in Morton-order and Morton-hybrid matrices

ACM SIGARCH Computer Architecture News
Cache-oblivious polygon indecomposability testing

Proceedings of the 4th International Workshop on Parallel and Symbolic Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

A style for programming problems from matrix algebra is developed with a familiar example and new tools, yielding high performance with a couple of surprising exceptions. The underlying philosophy is to use block recursion as the exclusive control structure, down to a 2p× 2p base case anyway, where hardware favors iterative style to fill its pipe. Use of Morton-ordered matrices yields excellent locality within the memory hierarchy—including block sharing among distributed computers. The recursion generalizes nicely to an SPMD program where such sharing is the only communication. Cholesky factorization of an n × n SPD matrix is used as a simple nontrivial example to expose the paradigm. The program amounts to four functions, two of which are finalizers for the other two. This insight allows final blocks to be shared with inter-node communication ∈ Θ(n2) for this algorithm ∈ Θ (n3) flops.