Automatic Resource Scheduling with Latency Hiding for Parallel Stencil Applications on GPGPU Clusters

  • Authors:
  • Kumiko Maeda;Masana Murase;Munehiro Doi;Hideaki Komatsu;Shigeho Noda;Ryutaro Himeno

  • Affiliations:
  • -;-;-;-;-;-

  • Venue:
  • IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Overlapping computations and communication is a key to accelerating stencil applications on parallel computers, especially for GPU clusters. However, such programming is a time-consuming part of the stencil application development. To address this problem, we developed an automatic code generation tool to produce a parallel stencil application with latency hiding automatically from its dataflow model. With this tool, users visually construct the workflows of stencil applications in a dataflow programming model. Our dataflow compiler determines a data decomposition policy for each application, and generates source code that overlaps the stencil computations and communication (MPI and PCIe). We demonstrate two types of overlapping models, a CPU-GPU hybrid execution model and a GPU-only model. We use a CFD benchmark computing 19-point 3D stencils to evaluate our scheduling performance, which results in 1.45 TFLOPS in single precision on a cluster with 64 Tesla C1060 GPUs.