Parallelizing SOR for GPGPUs using alternate loop tiling

  • Authors:
  • Peng Di;Hui Wu;Jingling Xue;Feng Wang;Canqun Yang

  • Affiliations:
  • School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia;School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia;School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia;School of Computer Science, National University of Defense Technology, Changsha 410073, China;School of Computer Science, National University of Defense Technology, Changsha 410073, China

  • Venue:
  • Parallel Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Gauss-Seidel and SOR, which are widely used smoothers in multigrid methods, are difficult to parallelize, particularly on GPGPUs due to the existence of DOACROSS data dependences. In this paper, we present a new parallel SOR method that admits more efficient data-parallel SIMD execution than red-black SOR on GPGPUs. Our solution is obtained non-conventionally, by starting from a K-layer SOR method and then parallelizing it by applying a non-dependence-preserving scheme consisting of a new domain decomposition technique followed by a loop tiling technique called alternate tiling. Despite its relatively slower convergence, our new method outperforms red-black SOR by making a better balance between data reuse and parallelism and by trading off convergence rate for SIMD parallelism. Our experimental results highlight the importance of synergy between domain experts, compiler optimizations and performance tuning in maximizing the performance of PDE-like DOACROSS loops on GPGPUs.