EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system

  • Authors:
  • Perry H. Wang;Jamison D. Collins;Gautham N. Chinya;Hong Jiang;Xinmin Tian;Milind Girkar;Nick Y. Yang;Guei-Yuan Lueh;Hong Wang

  • Affiliations:
  • Intel, Santa Clara, CA;Intel, Santa Clara, CA;Intel, Hillsboro, OR;Intel, Folsom, CA;Intel, Santa Clara, CA;Intel, Santa Clara, CA;Intel, Folsom, CA;Intel, Santa Clara, CA;Intel, Santa Clara, CA

  • Venue:
  • Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

Future mainstream microprocessors will likely integrate specialized accelerators, such as GPUs, onto a single die to achieve better performance and power efficiency. However, it remains a keen challenge to program such a heterogeneous multicore platform, since these specialized accelerators feature ISAs and functionality that are significantly different from the general purpose CPU cores. In this paper, we present EXOCHI: (1) Exoskeleton Sequencer(EXO), an architecture to represent heterogeneous acceleratorsas ISA-based MIMD architecture resources, and a shared virtual memory heterogeneous multithreaded program execution model that tightly couples specialized accelerator cores with generalpurpose CPU cores, and (2) C for Heterogeneous Integration(CHI), an integrated C/C++ programming environment that supports accelerator-specific inline assembly and domain-specific languages. The CHI compiler extends the OpenMP pragma for heterogeneous multithreading programming, and produces a single fat binary with code sections corresponding to different instruction sets. The runtime can judiciously spread parallel computation across the heterogeneous cores to optimize performance and power. We have prototyped the EXO architecture on a physical heterogeneous platform consisting of an Intel® Core™ 2 Duo processor and an 8-core 32-thread Intel® Graphics Media Accelerator X3000. In addition, we have implemented the CHI integrated programming environment with the Intel® C++ Compiler, runtime toolset, and debugger. On the EXO prototype system, we have enhanced a suite of production-quality media kernels for video and image processing to utilize the accelerator through the CHI programming interface, achieving significant speedup (1.41X to10.97X) over execution on the IA32 CPU alone.