FLAT: a GPU programming framework to provide embedded MPI
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Encapsulated synchronization and load-balance in heterogeneous programming
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters
The Journal of Supercomputing
Hi-index | 0.01 |
Nowadays, the compute capability of traditional cluster system can't keep up with the computing needs of a practical application, and these aspects of energy, space technology, etc. have become a huge problem. However, as parallel computing equipment, the stream processor (SP) has a high performance of floating-point operations. NVIDIA GPUs is a typical stream processor device, CUDA technology enables the way to develop a better parallel program on GPUs to become flexible. In this paper, we make use of the hybrid parallel computing programming environment (HPCPE) with MPI and CUDA technology to build the simple CPU + GPU-based stream processor cluster system. In addition, we also proposed the "Two Level Model (TLM)" to separate the intensive computing tasks and controlling tasks, and exploit the compute capability of contemporary GPUs to accelerate computing tasks. Finally, we conducted a relevant experiment about the calculation of N-Body problem, and verified the better performance that stream processor cluster system has than the traditional one.