Tuning solution of large non-Hermitian linear systems on multiple graphics processing unit accelerated workstations

Authors:
Florian Ries;Tommaso De Marco;Roberto Guerrieri
Affiliations:
Advanced Research Center on Electronic Systems for Information and Communication Technologies, E. De Castro (ARCES), Viale Carlo Pepoli 3/2, 40123 Bologna, Italy;Advanced Research Center on Electronic Systems for Information and Communication Technologies, E. De Castro (ARCES), Viale Carlo Pepoli 3/2, 40123 Bologna, Italy;Advanced Research Center on Electronic Systems for Information and Communication Technologies, E. De Castro (ARCES), Viale Carlo Pepoli 3/2, 40123 Bologna, Italy
Venue:
International Journal of High Performance Computing Applications
Year:
2012

Citing 14
Cited 0

An updated set of basic linear algebra subprograms (BLAS)

ACM Transactions on Mathematical Software (TOMS)
OpenMP: An Industry-Standard API for Shared-Memory Programming

IEEE Computational Science & Engineering
General purpose molecular dynamics simulations fully implemented on graphics processing units

Journal of Computational Physics
Benchmarking GPUs to tune dense linear algebra

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
3D finite difference computation on GPUs using CUDA

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Concurrent number cruncher: a GPU implementation of a general sparse linear solver

International Journal of Parallel, Emergent and Distributed Systems
Fast Conjugate Gradients with Multiple GPUs

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Exploring the multiple-GPU design space

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Triangular matrix inversion on Graphics Processing Unit

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Exploiting the capabilities of modern GPUs for dense matrix computations

Concurrency and Computation: Practice & Experience
A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform

PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Parallel Iterative Linear Solvers on GPU: A Financial Engineering Case

PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid

IEEE Transactions on Parallel and Distributed Systems
Comparing Hardware Accelerators in Scientific Applications: A Case Study

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work deals with the solution of large non-Hermitian linear systems on desktop workstations with multiple graphics processing units (GPUs). While our implementation is motivated by the need to accelerate volume conductor modeling for bioelectrical brain imaging, the problem itself is common in scientific computing. Whenever a complex partial differential equation is numerically solved, a typically non-Hermitian sparse complex linear system needs to be solved. For problem sizes in the millions, this can take a long time even with highly optimized CPU-based solvers. Our GPU-accelerated solver outperforms an optimized OpenMP-based reference running on two quad-core CPUs by a factor of up to 31- in single precision and up to 7- in double precision, at the cost of a very modest hardware upgrade of two dual-GPU GTX 295 graphics cards. A pair of stronger Fermi GPUs (GTX 480) achieves speedups of 30- in single precision and 15- in double precision.