An updated set of basic linear algebra subprograms (BLAS)
ACM Transactions on Mathematical Software (TOMS)
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
General purpose molecular dynamics simulations fully implemented on graphics processing units
Journal of Computational Physics
Benchmarking GPUs to tune dense linear algebra
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
3D finite difference computation on GPUs using CUDA
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Concurrent number cruncher: a GPU implementation of a general sparse linear solver
International Journal of Parallel, Emergent and Distributed Systems
Fast Conjugate Gradients with Multiple GPUs
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Exploring the multiple-GPU design space
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Triangular matrix inversion on Graphics Processing Unit
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Exploiting the capabilities of modern GPUs for dense matrix computations
Concurrency and Computation: Practice & Experience
A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform
PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Parallel Iterative Linear Solvers on GPU: A Financial Engineering Case
PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid
IEEE Transactions on Parallel and Distributed Systems
Comparing Hardware Accelerators in Scientific Applications: A Case Study
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
This work deals with the solution of large non-Hermitian linear systems on desktop workstations with multiple graphics processing units (GPUs). While our implementation is motivated by the need to accelerate volume conductor modeling for bioelectrical brain imaging, the problem itself is common in scientific computing. Whenever a complex partial differential equation is numerically solved, a typically non-Hermitian sparse complex linear system needs to be solved. For problem sizes in the millions, this can take a long time even with highly optimized CPU-based solvers. Our GPU-accelerated solver outperforms an optimized OpenMP-based reference running on two quad-core CPUs by a factor of up to 31- in single precision and up to 7- in double precision, at the cost of a very modest hardware upgrade of two dual-GPU GTX 295 graphics cards. A pair of stronger Fermi GPUs (GTX 480) achieves speedups of 30- in single precision and 15- in double precision.