Physis: an implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers

  • Authors:
  • Naoya Maruyama;Tatsuo Nomura;Kento Sato;Satoshi Matsuoka

  • Affiliations:
  • Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo, Japan;Google, Inc., Roppongi, Minato-ku, Tokyo, Japan;Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo, Japan;Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo, Japan

  • Venue:
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper proposes a compiler-based programming framework that automatically translates user-written structured grid code into scalable parallel implementation code for GPU-equipped clusters. To enable such automatic translations, we design a small set of declarative constructs that allow the user to express stencil computations in a portable and implicitly parallel manner. Our framework translates the user-written code into actual implementation code in CUDA for GPU acceleration and MPI for node-level parallelization with automatic optimizations such as computation and communication overlapping. We demonstrate the feasibility of such automatic translations by implementing several structured grid applications in our framework. Experimental results on the TSUBAME2.0 GPU-based supercomputer show that the performance is comparable as hand-written code and good strong and weak scalability up to 256 GPUs.