StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Compiler and Runtime Support for Running OpenMP Programs on Pentium-and Itanium-Architectures
HIPS '03 Proceedings of the Eighth International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS'03)
Cg: a system for programming graphics hardware in a C-like language
ACM SIGGRAPH 2003 Papers
Programmable Stream Processors
Computer
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Proceedings of the 31st annual international symposium on Computer architecture
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Metaprogramming GPUs with Sh
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Multiple Instruction Stream Processor
Proceedings of the 33rd annual international symposium on Computer Architecture
Accelerator: using data parallelism to program GPUs for general-purpose uses
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A performance-oriented data parallel virtual machine for GPUs
ACM SIGGRAPH 2006 Sketches
A memory model for scientific algorithms on graphics processors
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Performance evaluation of GPUs using the RapidMind development platform
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A programming model for an embedded media processing architecture
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Proceedings of the 21st annual international conference on Supercomputing
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Managing Multicore with OpenMP (Extended Abstract)
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
SoC-C: efficient programming abstractions for heterogeneous multicore systems on chip
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A comparison of programming models for multiprocessors with explicitly managed memory hierarchies
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
ARC '09 Proceedings of the 5th International Workshop on Reconfigurable Computing: Architectures, Tools and Applications
Programming model for a heterogeneous x86 platform
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
An Embrace-and-Extend Approach to Managing the Complexity of Future Heterogeneous Systems
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Building heterogeneous reconfigurable systems with a hardware microkernel
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Accurately evaluating application performance in simulated hybrid multi-tasking systems
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Compiling Python to a hybrid execution environment
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Conservation cores: reducing the energy of mature computations
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Bias scheduling in heterogeneous multi-core architectures
Proceedings of the 5th European conference on Computer systems
Proceedings of the 24th ACM International Conference on Supercomputing
Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications
Proceedings of the 37th annual international symposium on Computer architecture
A Capabilities-Aware Programming Model for Asymmetric High-End Systems
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
An OpenCL framework for heterogeneous multicores with local memory
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
MapCG: writing parallel program portable between CPU and GPU
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Embracing heterogeneity: parallel programming for changing hardware
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Adaptive line size cache for irregular references on cell multicore processor
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
A domain-specific approach to heterogeneous parallelism
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Cost-aware function migration in heterogeneous systems
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Bothnia: a dual-personality extension to the Intel integrated graphics driver
ACM SIGOPS Operating Systems Review
Bridging functional heterogeneity in multicore architectures
ACM SIGOPS Operating Systems Review
Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Low-energy automated scheduling of computing resources
Proceedings of the 1st ACM/IEEE workshop on Autonomic computing in economics
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
A survey of the practice of computational science
State of the Practice Reports
Seamlessly portable applications: Managing the diversity of modern heterogeneous systems
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Reflex: using low-power processors in smartphones without knowing them
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
BrickX: building hybrid systems for recursive computations
ACM SIGMETRICS Performance Evaluation Review
A compiler and runtime for heterogeneous computing
Proceedings of the 49th Annual Design Automation Conference
Architecture support for accelerator-rich CMPs
Proceedings of the 49th Annual Design Automation Conference
A survey on hardware-aware and heterogeneous computing on multicore processors and accelerators
Concurrency and Computation: Practice & Experience
Compiling a high-level language for GPUs: (via language support for architectures and compilers)
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Compiler and runtime support for enabling reduction computations on heterogeneous systems
Concurrency and Computation: Practice & Experience
CHARM: a composable heterogeneous accelerator-rich microprocessor
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
A CUDA programming toolkit on grids
International Journal of Grid and Utility Computing
Workload and power budget partitioning for single-chip heterogeneous processors
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
A new perspective for efficient virtual-cache coherence
Proceedings of the 40th Annual International Symposium on Computer Architecture
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Boosting CUDA Applications with CPU---GPU Hybrid Computing
International Journal of Parallel Programming
Hi-index | 0.01 |
Future mainstream microprocessors will likely integrate specialized accelerators, such as GPUs, onto a single die to achieve better performance and power efficiency. However, it remains a keen challenge to program such a heterogeneous multicore platform, since these specialized accelerators feature ISAs and functionality that are significantly different from the general purpose CPU cores. In this paper, we present EXOCHI: (1) Exoskeleton Sequencer(EXO), an architecture to represent heterogeneous acceleratorsas ISA-based MIMD architecture resources, and a shared virtual memory heterogeneous multithreaded program execution model that tightly couples specialized accelerator cores with generalpurpose CPU cores, and (2) C for Heterogeneous Integration(CHI), an integrated C/C++ programming environment that supports accelerator-specific inline assembly and domain-specific languages. The CHI compiler extends the OpenMP pragma for heterogeneous multithreading programming, and produces a single fat binary with code sections corresponding to different instruction sets. The runtime can judiciously spread parallel computation across the heterogeneous cores to optimize performance and power. We have prototyped the EXO architecture on a physical heterogeneous platform consisting of an Intel® Core™ 2 Duo processor and an 8-core 32-thread Intel® Graphics Media Accelerator X3000. In addition, we have implemented the CHI integrated programming environment with the Intel® C++ Compiler, runtime toolset, and debugger. On the EXO prototype system, we have enhanced a suite of production-quality media kernels for video and image processing to utilize the accelerator through the CHI programming interface, achieving significant speedup (1.41X to10.97X) over execution on the IA32 CPU alone.