Parallel GMRES implementation for solving sparse linear systems on GPU clusters
Proceedings of the 19th High Performance Computing Symposia
Parallelized incomplete poisson preconditioner in cloth simulation
MIG'11 Proceedings of the 4th international conference on Motion in Games
Parallel preconditioned conjugate gradient algorithm on GPU
Journal of Computational and Applied Mathematics
International Journal of High Performance Computing Applications
Proceedings of the International Conference on Computer-Aided Design
GPU-accelerated preconditioned iterative linear solvers
The Journal of Supercomputing
Using LAMA for efficient AMG on hybrid clusters
Computer Science - Research and Development
A Multiple-FPGA parallel computing architecture for real-time simulation of soft-object deformation
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
We present a parallel conjugate gradient solver for the Poisson problem optimized for multi-GPU platforms. Our approach includes a novel heuristic Poisson preconditioner well suited for massively-parallel SIMD processing. Furthermore, we address the problem of limited transfer rates over typical data channels such as the PCI-express bus relative to the bandwidth requirements of powerful GPUs. Specifically, naive communication schemes can severely reduce the achievable speedup in such communication-intense algorithms. For this reason, we employ overlapping memory transfers to establish a high level of concurrency and to improve scalability. We have implemented our model on a high-performance workstation with multiple hardware accelerators. We discuss the mathematical principles, give implementation details, and present the performance and the scalability of the system.