Asynchronous Communication Schemes for Finite Difference Methods on Multiple GPUs
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Parallel graph component labelling with GPUs and CUDA
Parallel Computing
On the GPGPU parallelization issues of finite element approximate inverse preconditioning
Journal of Computational and Applied Mathematics
Journal of Computational and Applied Mathematics
Three-dimensional thinning algorithms on graphics processing units and multicore CPUs
Concurrency and Computation: Practice & Experience
Empirical measurement of instruction level parallelism for four generations of ARM CPUs
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Parallel multi-objective Ant Programming for classification using GPUs
Journal of Parallel and Distributed Computing
AusPDC '12 Proceedings of the Tenth Australasian Symposium on Parallel and Distributed Computing - Volume 127
Simulating growth kinetics in a data-parallel 3d lattice photobioreactor
Modelling and Simulation in Engineering
Hi-index | 0.00 |
Graphical processing units (GPUs) have recently attracted attention for scientific applications such as particle simulations. This is partially driven by low commodity pricing of GPUs but also by recent toolkit and library developments that make them more accessible to scientific programmers. We discuss the application of GPU programming to two significantly different paradigms—regular mesh field equations with unusual boundary conditions and graph analysis algorithms. The differing optimization techniques required for these two paradigms cover many of the challenges faced when developing GPU applications. We discuss the relevance of these application paradigms to simulation engines and games. GPUs were aimed primarily at the accelerated graphics market but since this is often closely coupled to advanced game products it is interesting to speculate about the future of fully integrated accelerator hardware for both visualization and simulation combined. As well as reporting the speed-up performance on selected simulation paradigms, we discuss suitable data-parallel algorithms and present code examples for exploiting GPU features like large numbers of threads and localized texture memory. We find a surprising variation in the performance that can be achieved on GPUs for our applications and discuss how these findings relate to past known effects in parallel computing such as memory speed-related super-linear speed up. Copyright © 2009 John Wiley & Sons, Ltd.