Parallel Factorizations with Algorithmic Blocking

Authors:
Jaeyoung Choi
Affiliations:
-
Venue:
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Year:
2001

Citing 8
Cited 2

A parallel triangular solver for distributed-memory multiprocessor

SIAM Journal on Scientific and Statistical Computing
LAPACK: a portable linear algebra library for high-performance computers

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
Block-cyclic dense linear algebra

SIAM Journal on Scientific Computing
A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication

IBM Journal of Research and Development
ScaLAPACK user's guide

ScaLAPACK user's guide
LAPACK Working Note 96: Scalable Universal Matrix Multiplication Algorithm

LAPACK Working Note 96: Scalable Universal Matrix Multiplication Algorithm
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines

Scientific Programming

Enhancement of a WLAN-based internet service in Korea

Proceedings of the 1st ACM international workshop on Wireless mobile applications and services on WLAN hotspots
Adaptive transmission opportunity with admission control for IEEE 802.11e networks

MSWiM '05 Proceedings of the 8th ACM international symposium on Modeling, analysis and simulation of wireless and mobile systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Matrix factorization algorithms such as LU, QR, and Cholesky, are the most widely used methods for solving dense linear systems of equations, and have been extensively studied and implemented on vector and parallel computers. In this paper, we present parallel LU, QR, and Cholesky factorization routines with an "algorithmic blocking" on 2-dimensional block cyclic data distribution. With the algorithmic blocking, it is possible to obtain the near optimal performance irrespective of the physical block size. The routines are implemented on the SGI/Cray T3E and compared with the corresponding ScaLAPACK factorization routines.