The WY representation for products of householder matrices
SIAM Journal on Scientific and Statistical Computing - Papers from the Second Conference on Parallel Processing for Scientific Computin
A storage-efficient WY representation for products of householder transformations
SIAM Journal on Scientific and Statistical Computing
The inverses of block Hankel and block Toeplitz matrices
SIAM Journal on Computing
LAPACK's user's guide
Generalized Displacement Structure for Block-Toeplitz,Toeplitz-Block, and Toeplitz-Derived Matrices
SIAM Journal on Matrix Analysis and Applications
On the Stability of the Bareiss and Related Toeplitz Factorization Algorithms
SIAM Journal on Matrix Analysis and Applications
Displacement structure: theory and applications
SIAM Review
SIAM Journal on Matrix Analysis and Applications
Computation of Numerical Pade--Hermite and Simultaneous Pade Systems II: A Weakly Stable Algorithm
SIAM Journal on Matrix Analysis and Applications
ACM Transactions on Mathematical Software (TOMS)
Stability Issues in the Factorization of Structured Matrices
SIAM Journal on Matrix Analysis and Applications
ScaLAPACK user's guide
A Fast Stable Solver for Nonsymmetric Toeplitz and Quasi-Toeplitz Systems of Linear Equations
SIAM Journal on Matrix Analysis and Applications
VLSI and Modern Signal Processing
VLSI and Modern Signal Processing
Concurrent Iterative Algorithm for Toeplitz-like Linear Systems
IEEE Transactions on Parallel and Distributed Systems
High-performance algorithms to solve toeplitz and block toeplitz matrices
High-performance algorithms to solve toeplitz and block toeplitz matrices
A multicore solution to Block---Toeplitz linear systems of equations
The Journal of Supercomputing
Hi-index | 0.00 |
In this paper, we present an efficient parallel algorithm to solve Toeplitz--block and block--Toeplitz systems in distributed memory multicomputers. This algorithm parallelizes the Generalized Schur Algorithm to obtain the semi-normal equations. Our parallel implementation reduces the communication cost and optimizes the memory access. The experimental analysis on a cluster of personal computers shows the scalability of the implementation. The algorithm is portable because it is based on standard tools and libraries, such as ScaLAPACK and MPI.