GPU acceleration of the WSM6 cloud microphysics scheme in GRAPES model

Authors:
Huadong Xiao;Jing Sun;Xiaofeng Bian;Zhijun Dai
Affiliations:
-;-;-;-
Venue:
Computers & Geosciences
Year:
2013

Citing 7
Cited 0

WRF nature run

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Scalable Parallel Programming with CUDA

Queue - GPU Computing
Operating Systems: Internals and Design Principles

Operating Systems: Internals and Design Principles
Programming Massively Parallel Processors: A Hands-on Approach

Programming Massively Parallel Processors: A Hands-on Approach
Running the NIM Next-Generation Weather Model on GPUs

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
GPU Computing for Atmospheric Modeling

Computing in Science and Engineering
An 80-Fold Speedup, 15.0 TFlops Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

The microphysical process that leads to cloud and precipitation formation is one of the most important physical processes in numerical weather prediction (NWP) and climate models. The Weather Research Forecast (WRF) Single Moment 6-class (WSM6) microphysics scheme in the Global/Regional Assimilation and Prediction System (GRAPES) includes predictive variables of water vapor, cloud water, cloud ice, rain, snow and graupel. The computation of WSM6 is the most time-consuming portion among that of the entire GRAPES model. In recent years, with the advent of the Compute Unified Device Architecture (CUDA), modern graphics processing units (GPUs) with the advantage of low-power, low-cost, and high-performance computing capacity have been exploited to accomplish the arithmetic operations in scientific and engineering simulations. In this paper, we present an implementation of the WSM6 scheme in GRAPES using GPU to accelerate the computation. After a brief introduction to the WSM6 scheme, the data dependence for the GPU implementation of the WSM6 scheme is discussed. The data parallel method is employed to exploit the massive fine-grained parallelism. The CUDA programming model is used to convert the original WSM6 module into GPU programs. To achieve high computational performance, mapping horizontal domain onto an optimal block size is proposed. The experimental results demonstrate that the GPU version obtains over 140x speedup compared with the CPU serial version, and is an efficient parallel implementation.