GPU accelerated CAE using open solvers and the cloud

Authors:
Serban Georgescu;Peter Chow
Affiliations:
Fujitsu Laboratories of Europe Limited, Middlesex, United Kingdom;Fujitsu Laboratories of Europe Limited, Middlesex, United Kingdom
Venue:
ACM SIGARCH Computer Architecture News
Year:
2011

Citing 8
Cited 0

Linear algebra operators for GPU implementation of numerical algorithms

ACM SIGGRAPH 2003 Papers
Sparse matrix solvers on the GPU: conjugate gradients and multigrid

ACM SIGGRAPH 2003 Papers
Automatic performance tuning of sparse matrix kernels

Automatic performance tuning of sparse matrix kernels
Parallel Computing Experiences with CUDA

IEEE Micro
Solving Sparse Linear Systems on NVIDIA Tesla GPUs

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Fast Conjugate Gradients with Multiple GPUs

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Model-driven autotuning of sparse matrix-vector multiply on GPUs

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Concurrent number cruncher: an efficient sparse linear solver on the GPU

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

After more than five years since GPUs were first used as accelerators for general scientific computations, the field of General Purpose GPU computing or GPGPU has finally reached mainstream. Developers have now access to a mature hardware and software ecosystem. On the software side, several major open-source packages now support GPU acceleration while on the hardware side cloud-based solutions provide a simple way to access powerful machines with the latest GPUs at low cost. In this context, we look at the GPU acceleration of CAE, with a focus on the matrix solvers. We compare the performance that can be achieved using the open-source solver package PETSc ran on GPU-enabled Amazon EC2 hardware with that of an optimized legacy FEM code ran on a last generation 12-core blade server. Our results show that, although good performance can be achieved, some development is still needed to achieve peak performance.