SIAM Journal on Scientific and Statistical Computing
Modern C++ design: generic programming and design patterns applied
Modern C++ design: generic programming and design patterns applied
Iterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems
An Improved Magma Gemm For Fermi Graphics Processing Units
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
An automatic OpenCL compute kernel generator framework for linear algebra operations is presented. It allows for specifying matrix and vector operations in high-level C++ code, while the low-level details of OpenCL compute kernel generation and handling are dealt with in the background. Our approach releases users from considerable additional effort required for learning the details of programming graphics processing units (GPUs), and we demonstrate that higher performance than for a fixed, predefined set of OpenCL compute kernels is obtained due to the minimization of launch overhead. The generator is made available in the Vienna Computing Library (ViennaCL) and is demonstrated here with the stabilized bi-conjugate gradient algorithm, for which performance gains up to 40 percent are observed.