Efficient Tridiagonal Solvers on Multicomputers
IEEE Transactions on Computers
ScaLAPACK user's guide
A Fast Direct Solution of Poisson's Equation Using Fourier Analysis
Journal of the ACM (JACM)
An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations
Journal of the ACM (JACM)
The Solution of Tridiagonal Linear Systems on the CDC STAR 100 Computer
ACM Transactions on Mathematical Software (TOMS)
ACM Transactions on Mathematical Software (TOMS)
A Parallel Method for Tridiagonal Equations
ACM Transactions on Mathematical Software (TOMS)
Optimal Parallel Algorithms for Solving Tridiagonal Linear Systems
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems
ACM Transactions on Mathematical Software (TOMS)
IEEE Transactions on Parallel and Distributed Systems
A parallel symmetric block-tridiagonal divide-and-conquer algorithm
ACM Transactions on Mathematical Software (TOMS)
A Parallel Algorithm for Block Tridiagonal Systems
PDCAT '08 Proceedings of the 2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies
PSPIKE: A Parallel Hybrid Sparse Linear System Solver
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Fast tridiagonal solvers on the GPU
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Paper: A method to parallelize tridiagonal solvers
Parallel Computing
BCYCLIC: A parallel block tridiagonal matrix cyclic solver
Journal of Computational Physics
Hi-index | 0.00 |
Direct solvers based on prefix computation and cyclic reduction algorithms exploit the special structure of tridiagonal systems of equations to deliver better parallel performance compared to those designed for more general systems of equations. This performance advantage is even more pronounced for block tridiagonal systems. In this paper, we re-examine the performances of these two algorithms taking the effects of block size into account. Depending on the block size, the parameter space spanned by the number of block rows, size of the blocks and the processor count is shown to favor one or the other of the two algorithms. A critical block size that separates these two regions is shown to emerge and its dependence both on problem dependent parameters and on machine-specific constants is established. Empirical verification of these analytical findings is carried out on up to 2048 cores of a Cray XT4 system.