The design, implementation, and evaluation of a symmetric banded linear solver for distributed-memory parallel computers

Authors:
Anshul Gupta;Fred G. Gustavson;Mahesh Joshi;Sivan Toledo
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY;Univ. of Minnesota, Minneapolis;Xerox Palo Alto Research Center, Palo Alto, CA
Venue:
ACM Transactions on Mathematical Software (TOMS)
Year:
1998

Citing 8
Cited 6

A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
LAPACK's user's guide

LAPACK's user's guide
Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms

IBM Journal of Research and Development
SP2 system architecture

IBM Systems Journal
The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
ScaLAPACK user's guide

ScaLAPACK user's guide
The Parallel Evaluation of General Arithmetic Expressions

Journal of the ACM (JACM)
LAPACK Working Note 55: ScaLAPACK: A Scalable Linear Algebra Library for Distributed Memory Concurrent Computers

LAPACK Working Note 55: ScaLAPACK: A Scalable Linear Algebra Library for Distributed Memory Concurrent Computers

A comparison of parallel solvers for diagonally dominant and general narrow-banded linear systems

Parallel numerical linear algebra
A unified model for multicore architectures

IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Cache-optimal algorithms for option pricing

ACM Transactions on Mathematical Software (TOMS)
Evaluating multicore algorithms on the unified memory model

Scientific Programming - Software Development for Multi-core Computing Systems
Upper and lower I/O bounds for pebbling r-pyramids

Journal of Discrete Algorithms
A direct solver with reutilization of LU factorizations for h-adaptive finite element grids with point singularities

Computers & Mathematics with Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article describes the design, implementation, and evaluation of a parallel algorithm for the Cholesky factorization of symmetric banded matrices. The algorithm is part of IBM's parallel engineering and scientific subroutine library version 1.2 and is compatible with ScaLAPACK's banded solver. Analysis, as well as experiments on an IBM SP2 distributed-memory parallel computer, shows that the algorithm efficiently factors banded matrices with wide bandwidth. For example, a 31-mode SP2 factors a large matrix more than 16 times faster than a single node would factor it using the best sequential algorithm, and more than 20 times faster than a single node would using LAPACK's DPBTRF. The algorithm uses novel ideas in the area of distributed dense-matrix computations that include the use of a dynamic schedule for a blocked systolic-like algorithm and the separation of the input and output layouts from the layout the algorithm uses internally. The algorithm alson uses known techniques such as blocking to improve its communication-to-computation ratio and its data-cache behavior.