Techniques for the parallelization of unstructured grid applications on multi-GPU systems

  • Authors:
  • Lizandro Solano-Quinde;Brett Bode;Arun K. Somani

  • Affiliations:
  • Iowa State University, Ames, IA;University of Illinois, Urbana, IL;Iowa State University, Ames, IA

  • Venue:
  • Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Currently the set of scientific applications suitable for running on GPUs has increased due to the computational power of GPUs and the availability of programming languages that make more approachable writing scientific applications for GPUs. However, as the size of the problems increases, the global memory of GPUs becomes a limitation for running applications. Multi-GPU systems can potentially make memory limited problems tractable by dividing the data and computation among several GPUs. Parallel execution is seriously limited by the (i) application data dependencies, and (ii) data transfers among GPUs. In this paper we analyze the potential for parallelization of unstructured grid applications based on the data dependencies of the algorithm and the amount of data communication required. Due to data dependencies and the required communication, data and task parallelization techniques present different communication overheads and computing devices utilization. Based on this analysis we propose a scheme that takes advantage of data and task parallelism and reduces the communication overhead through computation-communication overlap. Our OpenCL implementation reduces the communication overhead by 38%, and, for comparison purposes, a two GPU implementation provides almost a five-fold increase in performance as compared to a CPU implementation.