Solving narrow banded systems on ensemble architectures

Authors:
S. Lennart Johnsson
Affiliations:
-
Venue:
ACM Transactions on Mathematical Software (TOMS)
Year:
1985

Citing 6
Cited 7

The computation and communication complexity of a parallel banded system solver

ACM Transactions on Mathematical Software (TOMS)
Band matrix systems solvers on ensemble architecture

Supercomputers: algorithms, architectures, and scientific computation
On Stable Parallel Linear System Solvers

Journal of the ACM (JACM)
Some Complexity Results for Matrix Computations on Parallel Processors

Journal of the ACM (JACM)
A Parallel Method for Tridiagonal Equations

ACM Transactions on Mathematical Software (TOMS)
Computer Solution of Large Sparse Positive Definite

Computer Solution of Large Sparse Positive Definite

LU decomposition of banded matrices and the solution of linear systems on hypercubes

C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
A Family of Permutations for Concurrent Factorization of Block Tridiagonal Matrices

IEEE Transactions on Computers
Solving banded systems using a parallel programming language with hierarchically data descriptive features

ICS '91 Proceedings of the 5th international conference on Supercomputing
HPFBench: a high performance Fortran benchmark suite

ACM Transactions on Mathematical Software (TOMS)
Load-balanced parallel banded-system solvers

Theoretical Computer Science
A comparison of parallel solvers for diagonally dominant and general narrow-banded linear systems

Parallel numerical linear algebra
A tearing-based hybrid parallel banded linear system solver

Journal of Computational and Applied Mathematics

Quantified Score

Hi-index	0.01

Visualization

Abstract

We present concurrent algorithms for the solution of narrow banded systems on ensemble architectures, and analyze the communication and arithmetic complexities of the algorithms. The algorithms consist of three phases. In phase 1, a block tridiagonal system of reduced size is produced through largely local operations. Diagonal dominance is preserved. If the original system is positive, definite, and symmetric, so is the reduced system. It is solved in a second phase, and the remaining variables obtained through local back substitution in a third phase. With a sufficient number of processing elements, there is no first and third phase. We investigate the arithmetic and communicationcomplexity of Gaussian elimination and block cyclic reduction for the solution of the reduced system on boolean cubes, perfect shuffle and shuffle-exchange networks, binary trees, and linear arrays. With an optimum number of processors, the minimum solution time on a linear array is of an order that ranges from Om2Nm to O(m3 + m3log2(N/m)) depending on the bandwidth, the dimension of the problem, and the times for communication and arithmetic. For boolean cubes, cube-connected cycles, prefect shuffle and shuffle-exchange networks, and binary trees, the minimum time is Om3+m3log 2N/m including the communication complexity