Static scheduling algorithms for allocating directed task graphs to multiprocessors
ACM Computing Surveys (CSUR)
An updated set of basic linear algebra subprograms (BLAS)
ACM Transactions on Mathematical Software (TOMS)
Compiling MATLAB Programs to ScaLAPACK: Exploiting Task and Data Parallelism
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Optimal scheduling of independent tasks on heterogeneous computing systems
ACM '74 Proceedings of the 1974 annual conference - Volume 1
pMatlab Parallel Matlab Library
International Journal of High Performance Computing Applications
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A control-structure splitting optimization for GPGPU
Proceedings of the 6th ACM conference on Computing frontiers
OpenMPC: Extended OpenMP Programming and Tuning for GPUs
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
McFLAT: a profile-based framework for MATLAB loop analysis and transformations
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Automatic C-to-CUDA code generation for affine programs
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
SemCache: semantics-aware caching for efficient GPU offloading
Proceedings of the 27th international ACM conference on International conference on supercomputing
Load balancing in a changing world: dealing with heterogeneity and performance variability
Proceedings of the ACM International Conference on Computing Frontiers
Time-stepping methods for the simulation of the self-assembly of nano-crystals in Matlab on a GPU
Journal of Computational Physics
Hi-index | 0.00 |
MATLAB is a popular software platform for scientific and engineering software writers. It offers a high level of abstraction for fundamental mathematical operations and extensive highly optimized domain-specific libraries for several scientific and engineering disciplines. With the recent availability of GPU libraries for MATLAB, it has become possible to easily exploit GPGPUs as coprocessors. However, this requires changing the code by carefully declaring variables that would live on the GPU, breaking the simplicity of the MATLAB programming model. We present a fully automatic source-level compilation technique to exploit a given GPU library for MATLAB, enabling coarse-grained heterogeneous parallelism across CPU and GPU. Our approach is based on empirically characterizing the library's functions, in order to build a comparative model of their performance on the CPU and GPU, which is then used along with a data communication cost model to maximize parallelism by selectively offloading some computation on the GPU. We achieve this by phrasing the problem as a binary integer linear programming problem aimed at minimizing CPU-GPU data movement, and using a hierarchical approach to keep the computational complexity in check. We have implemented our approach in a source-level MATLAB compiler, and present experimental results on a set of MATLAB kernels and applications using the GPUmat library. We show speedups of up to 7 times when the GPU is harnessed, compared to a standalone 8-core CPU.