Offload – automating code migration to heterogeneous multicore systems

Authors:
Pete Cooper;Uwe Dolinsky;Alastair F. Donaldson;Andrew Richards;Colin Riley;George Russell
Affiliations:
Codeplay Software Ltd., Edinburgh, UK;Codeplay Software Ltd., Edinburgh, UK;Oxford University Computing Laboratory, Oxford, UK;Codeplay Software Ltd., Edinburgh, UK;Codeplay Software Ltd., Edinburgh, UK;Codeplay Software Ltd., Edinburgh, UK
Venue:
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Year:
2010

Citing 11
Cited 13

Interprocedural constant propagation: an empirical study

ACM Letters on Programming Languages and Systems (LOPLAS)
StreamIt: A Language for Streaming Applications

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Power Efficient Processor Architecture and The Cell Processor

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
An evaluation of global address space languages: co-array fortran and unified parallel C

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing subroutines with optional parameters in F90 via function cloning

ACM SIGPLAN Notices
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CellSs: a programming model for the cell BE architecture

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A Proposal for Standard Graphics Environments

IEEE Computer Graphics and Applications
Revisiting SIMD Programming

Languages and Compilers for Parallel Computing
A case study on compiler optimizations for the Intel® Core™ 2 duo processor

International Journal of Parallel Programming

Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Improving programmability of heterogeneous many-core systems via explicit platform descriptions

Proceedings of the 4th International Workshop on Multicore Software Engineering
The impact of diverse memory architectures on multicore consumer software: an industrial perspective from the video games domain

Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Programming heterogeneous multicore systems using threading building blocks

Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Accelerating code on multi-cores with fastflow

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Automatic analysis of DMA races using model checking and k-induction

Formal Methods in System Design
Mainstream parallel array programming on cell

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Polyhedral parallel code generation for CUDA

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
A type-based approach to separating protocol from application logic: a case study in hybrid computer programming

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
libEOMP: a portable OpenMP runtime library based on MCA APIs for embedded systems

Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Portable mapping of openMP to multicore embedded systems using MCA APIs

Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Programmability and performance portability aspects of heterogeneous multi-/manycore systems

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Efficient Mapping of Irregular C++ Applications to Integrated GPUs

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present Offload, a programming model for offloading parts of a C++ application to run on accelerator cores in a heterogeneous multicore system. Code to be offloaded is enclosed in an offload scope; all functions called indirectly from an offload scope are compiled for the accelerator cores. Data defined inside/outside an offload scope resides in accelerator/host memory respectively, and code to move data between memory spaces is generated automatically by the compiler. This is achieved by distinguishing between host and accelerator pointers at the type level, and compiling multiple versions of functions based on pointer parameter configurations using automatic call-graph duplication. We discuss solutions to several challenging issues related to call-graph duplication, and present an implementation of Offload for the Cell BE processor, evaluated using a number of benchmarks.