Exploiting the capabilities of modern GPUs for dense matrix computations

Authors:
Sergio Barrachina;Maribel Castillo;Francisco D. Igual;Rafael Mayo;Enrique S. Quintana-Ortí;Gregorio Quintana-Ortí
Affiliations:
Dept. Ingeniería y Ciencia de los Computadores, Universidad Jaume I. Av. Sos Baynat, s-n. 12071 Castellón, Spain;Dept. Ingeniería y Ciencia de los Computadores, Universidad Jaume I. Av. Sos Baynat, s-n. 12071 Castellón, Spain;Dept. Ingeniería y Ciencia de los Computadores, Universidad Jaume I. Av. Sos Baynat, s-n. 12071 Castellón, Spain;Dept. Ingeniería y Ciencia de los Computadores, Universidad Jaume I. Av. Sos Baynat, s-n. 12071 Castellón, Spain;Dept. Ingeniería y Ciencia de los Computadores, Universidad Jaume I. Av. Sos Baynat, s-n. 12071 Castellón, Spain;Dept. Ingeniería y Ciencia de los Computadores, Universidad Jaume I. Av. Sos Baynat, s-n. 12071 Castellón, Spain
Venue:
Concurrency and Computation: Practice & Experience
Year:
2009

Citing 0
Cited 13

A mixed-precision algorithm for the solution of Lyapunov equations on hybrid CPU-GPU platforms

Parallel Computing
Accelerating model reduction of large linear systems with graphics processors

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Design patterns for scientific computations on sparse matrices

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures

ACM Transactions on Mathematical Software (TOMS)
The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations

Journal of Parallel and Distributed Computing
Speeding up solving of differential matrix Riccati equations using GPGPU computing and MATLAB

Concurrency and Computation: Practice & Experience
GPU acceleration of the caffa3d.MB model

ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part IV
Tuning solution of large non-Hermitian linear systems on multiple graphics processing unit accelerated workstations

International Journal of High Performance Computing Applications
Accelerating BST methods for model reduction with graphics processors

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Towards a finite volume model on a many-core platform

International Journal of High Performance Systems Architecture
All-pairs computations on many-core graphics processors

Parallel Computing
Unleashing CPU-GPU acceleration for control theory applications

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Design patterns for sparse-matrix computations on hybrid CPU/GPU platforms

Scientific Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present several algorithms to compute the solution of a linear system of equations on a graphics processor (GPU), as well as general techniques to improve their performance, such as padding and hybrid GPU-CPU computation. We compare single and double precision performance of a modern GPU with unified architecture, and show how iterative refinement with mixed precision can be used to regain full accuracy in the solution of linear systems, exploiting the potential of the processor for single precision arithmetic. Experimental results on a GTX280 using CUBLAS 2.0, the implementation of BLAS for NVIDIA® GPUs with unified architecture, illustrate the performance of the different algorithms and techniques proposed. Copyright © 2009 John Wiley & Sons, Ltd.