FLAT: a GPU programming framework to provide embedded MPI

Authors:
Takefumi Miyoshi;Hidetsugu Irie;Keigo Shima;Hiroki Honda;Masaaki Kondo;Tsutomu Yoshinaga
Affiliations:
The University of Electro-Communications, Chofu, Tokyo, Japan;The University of Electro-Communications, Chofu, Tokyo, Japan;The University of Electro-Communications, Chofu, Tokyo, Japan;The University of Electro-Communications, Chofu, Tokyo, Japan;The University of Electro-Communications, Chofu, Tokyo, Japan;The University of Electro-Communications, Chofu, Tokyo, Japan
Venue:
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Year:
2012

Citing 10
Cited 0

OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A Stream Processor Cluster Architecture Model with the Hybrid Technology of MPI and CUDA

ICISE '09 Proceedings of the 2009 First IEEE International Conference on Information Science and Engineering
A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
The Scalable Heterogeneous Computing (SHOC) benchmark suite

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster

Journal of Computational Physics
Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
MultiGPU computing using MPI or OpenMP

ICCP '10 Proceedings of the Proceedings of the 2010 IEEE 6th International Conference on Intelligent Computer Communication and Processing
Programming framework for clusters with heterogeneous accelerators

ACM SIGARCH Computer Architecture News
Achieving a single compute device image in OpenCL for multiple GPUs

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Multi-GPU MapReduce on GPU Clusters

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

For leveraging multiple GPUs in a cluster system, it is necessary to assign application tasks to multiple GPUs and execute those tasks with appropriately using communication primitives to handle data transfer among GPUs. In current GPU programming models, communication primitives such as MPI functions cannot be used within GPU kernels. Instead, such functions should be used in the CPU code. Therefore, programmer must handle both GPU kernel and CPU code for data communications. This makes GPU programming and its optimization very difficult. In this paper, we propose a programming framework named FLAT which enables programmers to use MPI functions within GPU kernels. Our framework automatically transforms MPI functions written in a GPU kernel into runtime routines executed on the CPU. The execution model and the implementation of FLAT are described, and the applicability of FLAT in terms of scalability and programmability is discussed. We also evaluate the performance of FLAT. The result shows that FLAT achieves good scalability for intended applications.