The computation and communication complexity of a parallel banded system solver
ACM Transactions on Mathematical Software (TOMS)
Band matrix systems solvers on ensemble architecture
Supercomputers: algorithms, architectures, and scientific computation
On Stable Parallel Linear System Solvers
Journal of the ACM (JACM)
Some Complexity Results for Matrix Computations on Parallel Processors
Journal of the ACM (JACM)
A Parallel Method for Tridiagonal Equations
ACM Transactions on Mathematical Software (TOMS)
Computer Solution of Large Sparse Positive Definite
Computer Solution of Large Sparse Positive Definite
LU decomposition of banded matrices and the solution of linear systems on hypercubes
C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
A Family of Permutations for Concurrent Factorization of Block Tridiagonal Matrices
IEEE Transactions on Computers
ICS '91 Proceedings of the 5th international conference on Supercomputing
HPFBench: a high performance Fortran benchmark suite
ACM Transactions on Mathematical Software (TOMS)
Load-balanced parallel banded-system solvers
Theoretical Computer Science
A comparison of parallel solvers for diagonally dominant and general narrow-banded linear systems
Parallel numerical linear algebra
A tearing-based hybrid parallel banded linear system solver
Journal of Computational and Applied Mathematics
Hi-index | 0.01 |
We present concurrent algorithms for the solution of narrow banded systems on ensemble architectures, and analyze the communication and arithmetic complexities of the algorithms. The algorithms consist of three phases. In phase 1, a block tridiagonal system of reduced size is produced through largely local operations. Diagonal dominance is preserved. If the original system is positive, definite, and symmetric, so is the reduced system. It is solved in a second phase, and the remaining variables obtained through local back substitution in a third phase. With a sufficient number of processing elements, there is no first and third phase. We investigate the arithmetic and communicationcomplexity of Gaussian elimination and block cyclic reduction for the solution of the reduced system on boolean cubes, perfect shuffle and shuffle-exchange networks, binary trees, and linear arrays. With an optimum number of processors, the minimum solution time on a linear array is of an order that ranges from Om2Nm to O(m3 + m3log2(N/m)) depending on the bandwidth, the dimension of the problem, and the times for communication and arithmetic. For boolean cubes, cube-connected cycles, prefect shuffle and shuffle-exchange networks, and binary trees, the minimum time is Om3+m3log 2N/m including the communication complexity