Interprocedural constant propagation: an empirical study
ACM Letters on Programming Languages and Systems (LOPLAS)
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
An evaluation of global address space languages: co-array fortran and unified parallel C
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing subroutines with optional parameters in F90 via function cloning
ACM SIGPLAN Notices
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CellSs: a programming model for the cell BE architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A Proposal for Standard Graphics Environments
IEEE Computer Graphics and Applications
Languages and Compilers for Parallel Computing
A case study on compiler optimizations for the Intel® Core™ 2 duo processor
International Journal of Parallel Programming
Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Improving programmability of heterogeneous many-core systems via explicit platform descriptions
Proceedings of the 4th International Workshop on Multicore Software Engineering
Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Programming heterogeneous multicore systems using threading building blocks
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Accelerating code on multi-cores with fastflow
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Automatic analysis of DMA races using model checking and k-induction
Formal Methods in System Design
Mainstream parallel array programming on cell
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Polyhedral parallel code generation for CUDA
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
libEOMP: a portable OpenMP runtime library based on MCA APIs for embedded systems
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Portable mapping of openMP to multicore embedded systems using MCA APIs
Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Programmability and performance portability aspects of heterogeneous multi-/manycore systems
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Efficient Mapping of Irregular C++ Applications to Integrated GPUs
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
We present Offload, a programming model for offloading parts of a C++ application to run on accelerator cores in a heterogeneous multicore system. Code to be offloaded is enclosed in an offload scope; all functions called indirectly from an offload scope are compiled for the accelerator cores. Data defined inside/outside an offload scope resides in accelerator/host memory respectively, and code to move data between memory spaces is generated automatically by the compiler. This is achieved by distinguishing between host and accelerator pointers at the type level, and compiling multiple versions of functions based on pointer parameter configurations using automatic call-graph duplication. We discuss solutions to several challenging issues related to call-graph duplication, and present an implementation of Offload for the Cell BE processor, evaluated using a number of benchmarks.