New Generalized Data Structures for Matrices Lead to a Variety of High Performance Algorithms
PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
Scientific Parallel Computing
Scan primitives for GPU computing
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Scalable Parallel Programming with CUDA
Queue - GPU Computing
Program optimization carving for GPU computing
Journal of Parallel and Distributed Computing
Exploiting graphical processing units for data-parallel scientific applications
Concurrency and Computation: Practice & Experience
Fast tridiagonal solvers on the GPU
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
IEEE Micro
A parallel non-square tiled algorithm for solving a kind of BVP for second-order ODEs
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid
IEEE Transactions on Parallel and Distributed Systems
An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Three-dimensional thinning algorithms on graphics processing units and multicore CPUs
Concurrency and Computation: Practice & Experience
Hi-index | 7.29 |
The aim of this paper is to show that a special kind of boundary value problem for solving second-order ordinary differential equations can be efficiently solved on modern heterogeneous computer architectures based on CPU and GPU Fermi processors. Such a problem reduces to the problem of solving a large tridiagonal system of linear equations with an almost Toeplitz structure. The considered algorithm is based on the recently developed divide and conquer method for solving linear recurrence systems with constant coefficients.