Techniques for the parallelization of unstructured grid applications on multi-GPU systems

Authors:
Lizandro Solano-Quinde;Brett Bode;Arun K. Somani
Affiliations:
Iowa State University, Ames, IA;University of Illinois, Urbana, IL;Iowa State University, Ames, IA
Venue:
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Year:
2012

Citing 9
Cited 1

Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
High Performance Lattice Boltzmann Algorithms for Fluid Flows

ISISE '08 Proceedings of the 2008 International Symposium on Information Science and Engieering - Volume 01
Solving dense linear systems on platforms with multiple hardware accelerators

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A unifying lifting collocation penalty formulation including the discontinuous Galerkin, spectral volume/difference methods for conservation laws on mixed grids

Journal of Computational Physics
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems

IEEE Design & Test
Achieving a single compute device image in OpenCL for multiple GPUs

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Unstructured grid applications on GPU: performance analysis and improvement

Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
CUDASA: compute unified device and systems architecture

EG PGV'08 Proceedings of the 8th Eurographics conference on Parallel Graphics and Visualization

A scalable, efficient scheme for evaluation of stencil computations over unstructured meshes

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Currently the set of scientific applications suitable for running on GPUs has increased due to the computational power of GPUs and the availability of programming languages that make more approachable writing scientific applications for GPUs. However, as the size of the problems increases, the global memory of GPUs becomes a limitation for running applications. Multi-GPU systems can potentially make memory limited problems tractable by dividing the data and computation among several GPUs. Parallel execution is seriously limited by the (i) application data dependencies, and (ii) data transfers among GPUs. In this paper we analyze the potential for parallelization of unstructured grid applications based on the data dependencies of the algorithm and the amount of data communication required. Due to data dependencies and the required communication, data and task parallelization techniques present different communication overheads and computing devices utilization. Based on this analysis we propose a scheme that takes advantage of data and task parallelism and reduces the communication overhead through computation-communication overlap. Our OpenCL implementation reduces the communication overhead by 38%, and, for comparison purposes, a two GPU implementation provides almost a five-fold increase in performance as compared to a CPU implementation.