Workload and power budget partitioning for single-chip heterogeneous processors

  • Authors:
  • Hao Wang;Vijay Sathish;Ripudaman Singh;Michael J. Schulte;Nam Sung Kim

  • Affiliations:
  • The University of Wisconsin-Madison, Madison, WI, USA;The University of Wisconsin-Madison, Madison, WI, USA;The University of Wisconsin-Madison, Madison, WI, USA;Advanced Micro Devices, Austin, TX, USA;The University of Wisconsin-Madison, Madison, WI, USA

  • Venue:
  • Proceedings of the 21st international conference on Parallel architectures and compilation techniques
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

With technology scaling, manufacturers are integrating both CPU and GPU cores in a single chip to improve the throughput of emerging applications. To maximize the throughput of a single-chip heterogeneous processor (SCHP), the chip power budget shared between the CPU and GPU must be effectively utilized. At the same time, the CPU and GPU in an SCHP must each satisfy its own power constraint. Furthermore, the power budget allocated to the CPU and GPU impacts performance. In this paper, using a detailed cycle-level SCHP simulator, we first demonstrate that the joint optimization of workload and power budget partitioning between the CPU and GPU can provide 13% higher throughput than the optimization of workload partitioning alone under a fixed power budget allocation to the CPU and GPU. Second, we propose an effective runtime algorithm that can determine near-optimal or optimal combinations of workload and power budget partitioning. The algorithm exploits the runtime power efficiencies of the workload executed on the CPU and the GPU. Using the detailed cycle-level SCHP simulator, we show that within five to eight kernel invocations the algorithm can achieve 96% of the maximum throughput obtained by an exhaustive search algorithm. Finally, we demonstrate comparable throughput improvements when we apply the algorithm to a commercial computing system with an SCHP.