Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Interprocedural array region analyses
International Journal of Parallel Programming - Special issue: selected papers from the eighth international workshop on languages and compilers for parallel computing
Symbolic bounds analysis of pointers, array indices, and accessed memory regions
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
A unified approach to global program optimization
POPL '73 Proceedings of the 1st annual ACM SIGACT-SIGPLAN symposium on Principles of programming languages
CIGAR: Application Partitioning for a CPU/Coprocessor Architecture
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Compiler Analysis of the Value Ranges for Variables
IEEE Transactions on Software Engineering
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
CUBA: an architecture for efficient CPU/co-processor data communication
Proceedings of the 22nd annual international conference on Supercomputing
A practical automatic polyhedral parallelizer and locality optimizer
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatic Offloading of C++ for the Cell BE Processor: A Case Study Using Offload
CISIS '10 Proceedings of the 2010 International Conference on Complex, Intelligent and Software Intensive Systems
Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Automatic CPU-GPU communication management and optimization
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Induction variable analysis with delayed abstractions
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
A hybrid approach of OpenMP for clusters
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Automatic C-to-CUDA code generation for affine programs
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Polyhedral parallel code generation for CUDA
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
COSMIC: middleware for high performance and reliable multiprocessing on xeon phi coprocessors
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Proceedings of the 27th international ACM conference on International conference on supercomputing
Semi-automatic restructuring of offloadable tasks for many-core accelerators
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Intel MIC (Many Integrated Core) is the first x86-based coprocessor architecture aimed at accelerating multi-core HPC applications. In the most common usage model, parallel code sections are offloaded to the MIC coprocessor using LEO (Language Extensions for Offload). The developer is responsible for identifying and specifying offloadable code regions, managing data transfers between the CPU and MIC and optimizing the application for performance, which requires some amount of effort and experimentation. In this paper, we present Apricot, an optimizing compiler and productivity tool for x86-compatible many-core coprocessors (such as Intel MIC) that minimizes developer effort by (i) automatically inserting LEO clauses for parallelizable code regions, (ii) selectively offloading some of the code regions to the coprocessor at runtime based on a cost model that we have developed, (iii) applying a set ofoptimizations for minimizing the data communication overhead and improving overall performance. Apricot is intended to assist programmers in porting existing multi-core applications and writing new ones to take advantage of the many-core coprocessor, while maximizing overall performance. Experiments with SpecOMP and NAS Parallel benchmarks show that Apricot can successfully transform OpenMP applications to run on the MIC coprocessor with good performance gains.