Automating GPU computing in MATLAB

Authors:
Chun-Yu Shei;Pushkar Ratnalikar;Arun Chauhan
Affiliations:
Indiana University, Bloomington, IN, USA;Indiana University, Bloomington, IN, USA;Indiana University, Bloomington, IN, USA
Venue:
Proceedings of the international conference on Supercomputing
Year:
2011

Citing 11
Cited 3

Static scheduling algorithms for allocating directed task graphs to multiprocessors

ACM Computing Surveys (CSUR)
An updated set of basic linear algebra subprograms (BLAS)

ACM Transactions on Mathematical Software (TOMS)
Compiling MATLAB Programs to ScaLAPACK: Exploiting Task and Data Parallelism

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Optimal scheduling of independent tasks on heterogeneous computing systems

ACM '74 Proceedings of the 1974 annual conference - Volume 1
pMatlab Parallel Matlab Library

International Journal of High Performance Computing Applications
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A control-structure splitting optimization for GPGPU

Proceedings of the 6th ACM conference on Computing frontiers
OpenMPC: Extended OpenMP Programming and Tuning for GPUs

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
McFLAT: a profile-based framework for MATLAB loop analysis and transformations

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Automatic C-to-CUDA code generation for affine programs

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction

SemCache: semantics-aware caching for efficient GPU offloading

Proceedings of the 27th international ACM conference on International conference on supercomputing
Load balancing in a changing world: dealing with heterogeneity and performance variability

Proceedings of the ACM International Conference on Computing Frontiers
Time-stepping methods for the simulation of the self-assembly of nano-crystals in Matlab on a GPU

Journal of Computational Physics

Quantified Score

Hi-index	0.00

Visualization

Abstract

MATLAB is a popular software platform for scientific and engineering software writers. It offers a high level of abstraction for fundamental mathematical operations and extensive highly optimized domain-specific libraries for several scientific and engineering disciplines. With the recent availability of GPU libraries for MATLAB, it has become possible to easily exploit GPGPUs as coprocessors. However, this requires changing the code by carefully declaring variables that would live on the GPU, breaking the simplicity of the MATLAB programming model. We present a fully automatic source-level compilation technique to exploit a given GPU library for MATLAB, enabling coarse-grained heterogeneous parallelism across CPU and GPU. Our approach is based on empirically characterizing the library's functions, in order to build a comparative model of their performance on the CPU and GPU, which is then used along with a data communication cost model to maximize parallelism by selectively offloading some computation on the GPU. We achieve this by phrasing the problem as a binary integer linear programming problem aimed at minimizing CPU-GPU data movement, and using a hierarchical approach to keep the computational complexity in check. We have implemented our approach in a source-level MATLAB compiler, and present experimental results on a set of MATLAB kernels and applications using the GPUmat library. We show speedups of up to 7 times when the GPU is harnessed, compared to a standalone 8-core CPU.