Streaming Algorithms for Biological Sequence Alignment on GPUs
IEEE Transactions on Parallel and Distributed Systems
Scalable Parallel Programming with CUDA
Queue - GPU Computing
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
Fast parallel Particle-To-Grid interpolation for plasma PIC simulations on the GPU
Journal of Parallel and Distributed Computing
A performance study of general-purpose applications on graphics processors using CUDA
Journal of Parallel and Distributed Computing
GPU for Parallel On-Board Hyperspectral Image Processing
International Journal of High Performance Computing Applications
Large calculation of the flow over a hypersonic vehicle using a GPU
Journal of Computational Physics
Parallel simulation for a fish schooling model on a general-purpose graphics processing unit
Concurrency and Computation: Practice & Experience
Journal of Parallel and Distributed Computing
GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model
Journal of Computational Physics
GPU accelerated simulations of bluff body flows using vortex particle methods
Journal of Computational Physics
High-performance cone beam reconstruction using CUDA compatible GPUs
Parallel Computing
Massively parallel forward modeling of scalar and tensor gravimetry data
Computers & Geosciences
Parallel 3D-TLM algorithm for simulation of the Earth-ionosphere cavity
Journal of Computational Physics
Hi-index | 31.45 |
Satellite-observed radiance is a nonlinear functional of surface properties and atmospheric temperature and absorbing gas profiles as described by the radiative transfer equation (RTE). In the era of hyperspectral sounders with thousands of high-resolution channels, the computation of the radiative transfer model becomes more time-consuming. The radiative transfer model performance in operational numerical weather prediction systems still limits the number of channels we can use in hyperspectral sounders to only a few hundreds. To take the full advantage of such high-resolution infrared observations, a computationally efficient radiative transfer model is needed to facilitate satellite data assimilation. In recent years the programmable commodity graphics processing unit (GPU) has evolved into a highly parallel, multi-threaded, many-core processor with tremendous computational speed and very high memory bandwidth. The radiative transfer model is very suitable for the GPU implementation to take advantage of the hardware's efficiency and parallelism where radiances of many channels can be calculated in parallel in GPUs. In this paper, we develop a GPU-based high-performance radiative transfer model for the Infrared Atmospheric Sounding Interferometer (IASI) launched in 2006 onboard the first European meteorological polar-orbiting satellites, METOP-A. Each IASI spectrum has 8461 spectral channels. The IASI radiative transfer model consists of three modules. The first module for computing the regression predictors takes less than 0.004% of CPU time, while the second module for transmittance computation and the third module for radiance computation take approximately 92.5% and 7.5%, respectively. Our GPU-based IASI radiative transfer model is developed to run on a low-cost personal supercomputer with four GPUs with total 960 compute cores, delivering near 4 TFlops theoretical peak performance. By massively parallelizing the second and third modules, we reached 364x speedup for 1GPU and 1455x speedup for all 4GPUs, both with respect to the original CPU-based single-threaded Fortran code with the -O"2 compiling optimization. The significant 1455x speedup using a computer with four GPUs means that the proposed GPU-based high-performance forward model is able to compute one day's amount of 1,296,000 IASI spectra within nearly 10min, whereas the original single CPU-based version will impractically take more than 10days. This model runs over 80% of the theoretical memory bandwidth with asynchronous data transfer. A novel CPU-GPU pipeline implementation of the IASI radiative transfer model is proposed. The GPU-based high-performance IASI radiative transfer model is suitable for the assimilation of the IASI radiance observations into the operational numerical weather forecast model.