Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor

  • Authors:
  • Henry Wong;Anne Bracy;Ethan Schuchman;Tor M. Aamodt;Jamison D. Collins;Perry H. Wang;Gautham Chinya;Ankur Khandelwal Groen;Hong Jiang;Hong Wang

  • Affiliations:
  • University of British Columbia, Vancouver, BC, Canada;Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA;University of British Columbia, Vancouver, BC, Canada;Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA

  • Venue:
  • Proceedings of the 17th international conference on Parallel architectures and compilation techniques
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Moore's Law and the drive towards performance efficiency have led to the on-chip integration of general-purpose cores with special-purpose accelerators. Pangaea is a heterogeneous CMP design for non-rendering workloads that integrates IA32 CPU cores with non-IA32 GPU-class multi-cores, extending the current state-of-the-art CPU-GPU integration that physically "fuses" existing CPU and GPU designs. Pangaea introduces (1) a resource repartitioning of the GPU, where the hardware budget dedicated for 3D-specific graphics processing is used to build more general-purpose GPU cores, and (2) a 3-instruction extension to the IA32 ISA that supports tighter architectural integration and fine-grain shared memory collaborative multithreading between the IA32 CPU cores and the non-IA32 GPU cores. We implement Pangaea and the current CPU-GPU designs in fully-functional synthesizable RTL based on the production quality RTL of an IA32 CPU and an Intel GMA X4500 GPU. On a 65 nm ASIC process technology, the legacy graphics-specific fixed-function hardware has the area of 9 GPU cores and total power consumption of 5 GPU cores. With the ISA extensions, the latency from the time an IA32 core spawns a GPU thread to the time the thread begins execution is reduced from thousands of cycles to fewer than 30 cycles. Pangaea is synthesized on a FPGA-based prototype and runs off-the-shelf IA32 OSes. A set of general-purpose non-graphics workloads demonstrate speedups of up to 8.8x.