Iterative methods for solving linear systems
Iterative methods for solving linear systems
Reliable Computation of the Condition Number of a Tridiagonal Matrix in O(n) Time
SIAM Journal on Matrix Analysis and Applications
A Fast Direct Solution of Poisson's Equation Using Fourier Analysis
Journal of the ACM (JACM)
An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations
Journal of the ACM (JACM)
Parallel Computers Two: Architecture, Programming and Algorithms
Parallel Computers Two: Architecture, Programming and Algorithms
Elementary Numerical Analysis: An Algorithmic Approach
Elementary Numerical Analysis: An Algorithmic Approach
SIAM Journal on Matrix Analysis and Applications
A parallel hybrid banded system solver: the SPIKE algorithm
Parallel Computing - Parallel matrix algorithms and applications (PMAA'04)
Scan primitives for GPU computing
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Fast tridiagonal solvers on the GPU
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
An asymmetric distributed shared memory model for heterogeneous parallel systems
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid
IEEE Transactions on Parallel and Distributed Systems
Register packing for cyclic reduction: a case study
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
A Scalable Tridiagonal Solver for GPUs
ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
A scalable, efficient scheme for evaluation of stencil computations over unstructured meshes
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
In this paper, we present a scalable, numerically stable, high-performance tridiagonal solver. The solver is based on the SPIKE algorithm for partitioning a large matrix into small independent matrices, which can be solved in parallel. For each small matrix, our solver applies a general 1-by-1 or 2-by-2 diagonal pivoting algorithm, which is also known to be numerically stable. Our paper makes two major contributions. First, our solver is the first numerically stable tridiagonal solver for GPUs. Our solver provides comparable quality of stable solutions to Intel MKL and Matlab, at speed comparable to the GPU tridiagonal solvers in existing packages like CUSPARSE. It is also scalable to multiple GPUs and CPUs. Second, we present and analyze two key optimization strategies for our solver: a high-throughput data layout transformation for memory efficiency, and a dynamic tiling approach for reducing the memory access footprint caused by branch divergence.