A Nonstandard Cyclic Reduction Method, Its Variants and Stability
SIAM Journal on Matrix Analysis and Applications
A Fast Direct Solution of Poisson's Equation Using Fourier Analysis
Journal of the ACM (JACM)
An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations
Journal of the ACM (JACM)
SIAM Journal on Scientific Computing
Scan primitives for GPU computing
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Fast tridiagonal solvers on the GPU
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid
IEEE Transactions on Parallel and Distributed Systems
Matrix decomposition algorithms for elliptic boundary value problems: a survey
Numerical Algorithms
Register packing for cyclic reduction: a case study
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
A Scalable Tridiagonal Solver for GPUs
ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
Memory Hierarchy Optimization for Large Tridiagonal System Solvers on GPU
ISPA '12 Proceedings of the 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications
Hi-index | 0.00 |
Two block cyclic reduction linear system solvers are considered and implemented using the OpenCL framework. The topics of interest include a simplified scalar cyclic reduction tridiagonal system solver and the impact of increasing the radix-number of the algorithm. Both implementations are tested for the Poisson problem in two and three dimensions, using a Nvidia GTX 580 series GPU and double precision floating-point arithmetic. The numerical results indicate up to 6-fold speed increase in the case of the two-dimensional problems and up to 3-fold speed increase in the case of the three-dimensional problems when compared to equivalent CPU implementations run on a Intel Core i7 quad-core CPU.