Accelerating model reduction of large linear systems with graphics processors
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Design patterns for scientific computations on sparse matrices
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
ACM Transactions on Mathematical Software (TOMS)
Journal of Parallel and Distributed Computing
Speeding up solving of differential matrix Riccati equations using GPGPU computing and MATLAB
Concurrency and Computation: Practice & Experience
GPU acceleration of the caffa3d.MB model
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part IV
International Journal of High Performance Computing Applications
Accelerating BST methods for model reduction with graphics processors
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Towards a finite volume model on a many-core platform
International Journal of High Performance Systems Architecture
All-pairs computations on many-core graphics processors
Parallel Computing
Unleashing CPU-GPU acceleration for control theory applications
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Design patterns for sparse-matrix computations on hybrid CPU/GPU platforms
Scientific Programming
Hi-index | 0.00 |
We present several algorithms to compute the solution of a linear system of equations on a graphics processor (GPU), as well as general techniques to improve their performance, such as padding and hybrid GPU-CPU computation. We compare single and double precision performance of a modern GPU with unified architecture, and show how iterative refinement with mixed precision can be used to regain full accuracy in the solution of linear systems, exploiting the potential of the processor for single precision arithmetic. Experimental results on a GTX280 using CUBLAS 2.0, the implementation of BLAS for NVIDIA® GPUs with unified architecture, illustrate the performance of the different algorithms and techniques proposed. Copyright © 2009 John Wiley & Sons, Ltd.