A unified view of non-monotonic core selection and application steering in heterogeneous chip multiprocessors

  • Authors:
  • Sandeep Navada;Niket K. Choudhary;Salil V. Wadhavkar;Eric Rotenberg

  • Affiliations:
  • Qualcomm, Raleigh, NC, USA;Qualcomm, Raleigh, NC, USA;Qualcomm, Raleigh, NC, USA;North Carolina State University, Raleigh, NC, USA

  • Venue:
  • PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

A single-ISA heterogeneous chip multiprocessor (HCMP) is an attractive substrate to improve single-thread performance and energy efficiency in the dark silicon era. We consider HCMPs comprised of non-monotonic core types where each core type is performance-optimized to different instruction-level behavior and hence cannot be ranked -- different program phases achieve their highest performance on different cores. Although non-monotonic heterogeneous designs offer higher performance potential than either monotonic heterogeneous designs or homogeneous designs, steering applications to the best-performing core is challenging due to performance ambiguity of core types. In this paper, we present a unified view of selecting non-monotonic core types at design-time and steering program phases to cores at run-time. After comprehensive evaluation, we found that with N core types, the optimal HCMP for single-thread performance is comprised of an "average core" type coupled with N-1 "accelerator core" types that relieve distinct resource bottlenecks in the average core. This inspires a complementary steering algorithm in which a running program is continuously diagnosed for bottlenecks on the current core. If any are observed, the program is migrated to an accelerator core that relieves any of the bottlenecks and does not worsen any of them. If no accelerator core satisfies this condition, then the average core is selected. In our evaluation, we show that a 4-core-type HCMP improves single-thread performance up to 76% and 15% on average over a homogeneous chip multiprocessor, and our steering algorithm is able to capture most of this performance gain. Further, we show that our steering algorithm on a 4-core-type HCMP is, on average, 33% more power-efficient (BIPS3/watt) than a homogeneous chip multiprocessor.