Multi-science applications with single codebase - GAMER - for massively parallel architectures

  • Authors:
  • Hemant Shukla;Hsi-Yu Schive;Tak-Pong Woo;Tzihong Chiueh

  • Affiliations:
  • Lawrence Berkeley National laboratory, Berkeley;National Taiwan University, Taipei, Taiwan;Soochow University, Taipei, Taiwan;National Taiwan University, Taipei, Taiwan

  • Venue:
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The growing need for power efficient extreme-scale highperformance computing (HPC) coupled with plateauing clock-speeds is driving the emergence of massively parallel compute architectures. Tens to many hundreds of cores are increasingly made available as compute units, either as the integral part of the main processor or as coprocessors designed for handling massively parallel workloads. In the case of many-core graphics processing units (GPUs) hundreds of SIMD cores primarily designed for image and video rendering are used for high-performance scientific computations. The new architectures typically offer ANSI standard programming models such as CUDA (NVIDIA) and OpenCL. However, the wide-ranging adoption of these parallel architectures is steeped in difficult learning curve and requires reengineering of existing applications that mostly leads to expensive and error prone code rewrites without prior guarantee and knowledge of any speedups. Broad range of complex scientific applications across many domains use common algorithms and techniques, such as adaptive mesh refinements (AMR), advanced hydrodynamics partial differential equation (PDE) solvers, Poisson-Gravity solvers etc, that have demonstrably performed highly efficiently on GPU based systems. Taking advantage of the commonalities, we use GPU-aware AMR code, GAMER [1], to examine the unique approach of solving multi-science problems in astrophysics, hydrodynamics and particle physics with single codebase. We demonstrate significant speedups in disparate class of scientific applications on 3 separate clusters, viz., Dirac, Laohu and Mole 8.5. By extensively reusing the extendable single codebase we mitigate the impediments of significant code rewrites. We also collect performance and energy consumption benchmark metrics on 50-nodes NVIDIA C2050 GPU and Intel 8-core Nehalem CPU on Dirac cluster at the National Energy Research Supercomputing Center (NERSC). In addition, we propose a strategy and framework for legacy and new applications to successfully leverage the evolving GAMER codebase on massively parallel architectures. The framework and the benchmarks are aimed to help quantify the adoption strategies for legacy and new scientific applications.