QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism

Authors:
Jeremy D. Frens;David S. Wise
Affiliations:
Calvin College, Grand Rapids, MI;Indiana University, Bloomington, IN
Venue:
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
2003

Citing 23
Cited 8

LAPACK: a portable linear algebra library for high-performance computers

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Advances in parallel algorithms

Advances in parallel algorithms
The divide-and-conquer paradigm as a basis for parallel language design

Advances in parallel algorithms
LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Parallel and distributed computing handbook

Parallel and distributed computing handbook
LogP: a practical model of parallel computation

Communications of the ACM
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms

The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
ScaLAPACK user's guide

ScaLAPACK user's guide
Advanced compiler design and implementation

Advanced compiler design and implementation
Undulant-block elimination and integer-preserving matrix inversion

Science of Computer Programming
A Transformation System for Developing Recursive Programs

Journal of the ACM (JACM)
Exact analysis of the cache behavior of nested loops

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Language support for Morton-order matrices

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
LogGPS: a parallel computational model for synchronization analysis

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Accuracy and Stability of Numerical Algorithms

Accuracy and Stability of Numerical Algorithms
Recursive Array Layouts and Fast Matrix Multiplication

IEEE Transactions on Parallel and Distributed Systems
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Matrix factorization using a block-recursive structure and block-recursive algorithms

Matrix factorization using a block-recursive structure and block-recursive algorithms
Applying recursion to serial and parallel QR factorization leads to better performance

IBM Journal of Research and Development

The Opie compiler from row-major source to Morton-ordered matrices

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
A hierarchical model of data locality

Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A library of constructive skeletons for sequential style of parallel programming

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
A compositional framework for developing parallel programs on two-dimensional arrays

International Journal of Parallel Programming
Surrounding theorem: developing parallel programs for matrix-convolutions

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Communication-optimal Parallel and Sequential QR and LU Factorizations

SIAM Journal on Scientific Computing
Graph expansion and communication costs of fast matrix multiplication

Journal of the ACM (JACM)
Communication efficient gaussian elimination with partial pivoting using a shape morphing data layout

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures

Quantified Score

Hi-index	0.00

Visualization

Abstract

Quadtree matrices using Morton-order storage provide natural blocking on every level of a memory hierarchy. Writing the natural recursive algorithms to take advantage of this blocking results in code that honors the memory hierarchy without the need for transforming the code. Furthermore, the divide-and-conquer algorithm breaks problems down into independent computations. These independent computations can be dispatched in parallel for straightforward parallel processing.Proof-of-concept is given by an algorithm for QR factorization based on Givens rotations for quadtree matrices in Morton-order storage. The algorithms deliver positive results, competing with and even beating the LAPACK equivalent.