OpenMP to GPGPU: a compiler framework for automatic translation and optimization
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A Stream Processor Cluster Architecture Model with the Hybrid Technology of MPI and CUDA
ICISE '09 Proceedings of the 2009 First IEEE International Conference on Information Science and Engineering
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
The Scalable Heterogeneous Computing (SHOC) benchmark suite
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster
Journal of Computational Physics
Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
MultiGPU computing using MPI or OpenMP
ICCP '10 Proceedings of the Proceedings of the 2010 IEEE 6th International Conference on Intelligent Computer Communication and Processing
Programming framework for clusters with heterogeneous accelerators
ACM SIGARCH Computer Architecture News
Achieving a single compute device image in OpenCL for multiple GPUs
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Multi-GPU MapReduce on GPU Clusters
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Hi-index | 0.00 |
For leveraging multiple GPUs in a cluster system, it is necessary to assign application tasks to multiple GPUs and execute those tasks with appropriately using communication primitives to handle data transfer among GPUs. In current GPU programming models, communication primitives such as MPI functions cannot be used within GPU kernels. Instead, such functions should be used in the CPU code. Therefore, programmer must handle both GPU kernel and CPU code for data communications. This makes GPU programming and its optimization very difficult. In this paper, we propose a programming framework named FLAT which enables programmers to use MPI functions within GPU kernels. Our framework automatically transforms MPI functions written in a GPU kernel into runtime routines executed on the CPU. The execution model and the implementation of FLAT are described, and the applicability of FLAT in terms of scalability and programmability is discussed. We also evaluate the performance of FLAT. The result shows that FLAT achieves good scalability for intended applications.