An Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization

Authors:
Gregorio Quintana-Ortí;Enrique S. Quintana-Ortí;Alfredo Remón;Robert A. Geijn
Affiliations:
Departamento de Ingeniería y Ciencia de Computadores, Universidad Jaume I, Spain;Departamento de Ingeniería y Ciencia de Computadores, Universidad Jaume I, Spain;Departamento de Ingeniería y Ciencia de Computadores, Universidad Jaume I, Spain;Department of Computer Sciences, The University of Texas at Austin, Austin 78712
Venue:
High Performance Computing for Computational Science - VECPAR 2008
Year:
2008

Citing 13
Cited 0

LAPACK's user's guide

LAPACK's user's guide
Solving Linear Systems on Vector and Shared Memory Computers

Solving Linear Systems on Vector and Shared Memory Computers
Recursive Array Layouts and Fast Matrix Multiplication

IEEE Transactions on Parallel and Distributed Systems
BLAS Based on Block Data Structures

BLAS Based on Block Data Structures
Tiling, Block Data Layout, and Memory Hierarchy Performance

IEEE Transactions on Parallel and Distributed Systems
The science of deriving dense linear algebra algorithms

ACM Transactions on Mathematical Software (TOMS)
Representing linear algebra algorithms in code: the FLAME application program interfaces

ACM Transactions on Mathematical Software (TOMS)
CellSs: a programming model for the cell BE architecture

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Scheduling of QR Factorization Algorithms on SMP and Multi-Core Architectures

PDP '08 Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008)
SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Satisfying your dependencies with SuperMatrix

CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
Cholesky factorization of band matrices using multithreaded BLAS

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We pursue the scalable parallel implementation of the factor- ization of band matrices with medium to large bandwidth targeting SMP and multi-core architectures. Our approach decomposes the computation into a large number of fine-grained operations exposing a higher degree of parallelism. The SuperMatrix run-time system allows an out-of-order scheduling of operations that is transparent to the programmer. Exper- imental results for the Cholesky factorization of band matrices on two parallel platforms with sixteen processors demonstrate the scalability of the solution.