A performance study of general-purpose applications on graphics processors using CUDA
Journal of Parallel and Distributed Computing
Towards Accelerated Computation of Atmospheric Equations Using CUDA
UKSIM '09 Proceedings of the UKSim 2009: 11th International Conference on Computer Modelling and Simulation
Accelerating geoscience and engineering system simulations on graphics hardware
Computers & Geosciences
Large-scale FFT on GPU clusters
Proceedings of the 24th ACM International Conference on Supercomputing
GPU Computing for Atmospheric Modeling
Computing in Science and Engineering
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Scaling Hierarchical N-body Simulations on GPU Clusters
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Optimal Utilization of Heterogeneous Resources for Biomolecular Simulations
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Accelerating S3D: a GPGPU case study
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Hybrid Core Acceleration of UWB SIRE Radar Signal Processing
IEEE Transactions on Parallel and Distributed Systems
Reducing branch divergence in GPU programs
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
International Journal of High Performance Computing Applications
ACM Transactions on Computing Education (TOCE)
Hi-index | 0.00 |
Geoscience simulations rely heavily on high performance computing (HPC) systems. To date, many CPU/GPU heterogeneous HPC systems have been established on which many geoscience simulations have been performed. For most of these simulations on GPU clusters, it can be observed that only the GPU's computational capacity has been exploited to accomplish the arithmetic operations while that of the CPU is ignored, which results in an underutilization of the computing resources within the entire HPC system. In this paper, we perform a long-wave radiation simulation by exploiting the computational capacities of both CPUs and GPUs in the Tianhe-1A supercomputer. First, the long-wave radiation process is accelerated with a Tesla M2050GPU and achieves significant speedup over the baseline performance on a single Intel X5670 CPU core. Second, a workload distribution scheme based on the speedup feedback is proposed and validated with various workloads. Third, a parallel programming model (MPI+OpenMP/CUDA) is presented and utilized when simulating the radiation physics on large GPU clusters. Finally, we address the computational efficiency issue by exploiting the available computing resources within the Tianhe-1A supercomputer. Experimental results demonstrate that the hybrid version can be accomplished within much less time than that of the CPU counterpart; also, they show similar sensitivity to the temporal resolution of the radiation process.