The Architecture of SM3: A Dynamically Partitionable Multicomputer System
IEEE Transactions on Computers
Alternating direction methods on multiprocessors
SIAM Journal on Scientific and Statistical Computing
Solving problems on concurrent processors
Solving problems on concurrent processors
Solving narrow banded systems on ensemble architectures
ACM Transactions on Mathematical Software (TOMS)
Performance evaluation of scientific programs on advanced architecture computers
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
What have we learnt from using real parallel machines to solve real problems?
C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
Optimal matrix algorithms on homogeneous hypercubes
C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
Finite-Element Analysis on a PC
IEEE Software
A comparison of parallel solvers for diagonally dominant and general narrow-banded linear systems
Parallel numerical linear algebra
Hi-index | 0.00 |
We describe the solution of linear systems of equations, Ax = b, on distributed-memory concurrent computers whose interconnect topology contains a two-dimensional mesh. A is assumed to be an M×M banded matrix. The problem is generalized to the case in which there are nb distinct right-hand sides, b, and can thus be expressed as AX = B, where X and B are both M×nb matrices. The solution is obtained by the LU decomposition method which proceeds in three stages: (1) LU decomposition of the matrix A, (2) forward reduction, (3) back substitution. Since the matrix A is banded a simple rectangular subblock decomposition of the matrices A, X, and B over the nodes of the ensemble results in excessive load imbalance. A scattered decomposition is therefore used to decompose the data. The sequential and concurrent algorithms are described in detail, and models of the performance of the concurrent algorithm are presented for each of the three stages of the algorithm. In order to ensure numerical stability the algorithm is extended to include partial pivoting. Performance models for the pivoting case are also given. Results from a 128-node Caltech/JPL Mark II hypercube are presented, and the performance models are found to be a good agreement with these data. Indexing overhead was found to contribute significantly to the total concurrent overhead.