A unified view of non-monotonic core selection and application steering in heterogeneous chip multiprocessors

Authors:
Sandeep Navada;Niket K. Choudhary;Salil V. Wadhavkar;Eric Rotenberg
Affiliations:
Qualcomm, Raleigh, NC, USA;Qualcomm, Raleigh, NC, USA;Qualcomm, Raleigh, NC, USA;North Carolina State University, Raleigh, NC, USA
Venue:
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Year:
2013

Citing 36
Cited 0

Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
ET2: a metric for time and energy efficiency of computation

Power aware computing
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance

Proceedings of the 31st annual international symposium on Computer architecture
Continual flow pipelines

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Best of Both Latency and Throughput

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Speculative Execution In High Performance Computer Architectures (Chapman & Hall/Crc Computer & Information Science Series)

Speculative Execution In High Performance Computer Architectures (Chapman & Hall/Crc Computer & Information Science Series)
Dynamic thread assignment on heterogeneous multiprocessor architectures

Proceedings of the 3rd conference on Computing frontiers
Core architecture optimization for heterogeneous chip multiprocessors

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Illustrative Design Space Studies with Microarchitectural Regression Models

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Measuring Program Similarity: Experiments with SPEC CPU Benchmark Suites

ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Amdahl's Law in the Multicore Era

Computer
Accelerating critical section execution with asymmetric multi-core architectures

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
HASS: a scheduler for heterogeneous multicore systems

ACM SIGOPS Operating Systems Review
Configurational Workload Characterization

ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
Efficient program scheduling for heterogeneous multi-core processors

Proceedings of the 46th Annual Design Automation Conference
Core-Selectability in Chip Multiprocessors

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Bias scheduling in heterogeneous multi-core architectures

Proceedings of the 5th European conference on Computer systems
A comprehensive scheduler for asymmetric multicore systems

Proceedings of the 5th European conference on Computer systems
Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis

Proceedings of the 37th annual international symposium on Computer architecture
Scalable thread scheduling and global power management for heterogeneous many-core architectures

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Criticality-driven superscalar design space exploration

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Dynamic voltage and frequency scaling: the laws of diminishing returns

HotPower'10 Proceedings of the 2010 international conference on Power aware computing and systems
A Predictive Model for Dynamic Microarchitectural Adaptivity Control

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
FabScalar: composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template

Proceedings of the 38th annual international symposium on Computer architecture
Dark silicon and the end of multicore scaling

Proceedings of the 38th annual international symposium on Computer architecture
Phase-Guided Scheduling on Single-ISA Heterogeneous Multicore Processors

DSD '11 Proceedings of the 2011 14th Euromicro Conference on Digital System Design
Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Performance Per Watt Benefits of Dynamic Core Morphing in Asymmetric Multicores

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Phase-based tuning for better utilization of performance-asymmetric multicore processors

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Scheduling heterogeneous multi-cores through Performance Impact Estimation (PIE)

Proceedings of the 39th Annual International Symposium on Computer Architecture
Composite Cores: Pushing Heterogeneity Into a Core

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

A single-ISA heterogeneous chip multiprocessor (HCMP) is an attractive substrate to improve single-thread performance and energy efficiency in the dark silicon era. We consider HCMPs comprised of non-monotonic core types where each core type is performance-optimized to different instruction-level behavior and hence cannot be ranked -- different program phases achieve their highest performance on different cores. Although non-monotonic heterogeneous designs offer higher performance potential than either monotonic heterogeneous designs or homogeneous designs, steering applications to the best-performing core is challenging due to performance ambiguity of core types. In this paper, we present a unified view of selecting non-monotonic core types at design-time and steering program phases to cores at run-time. After comprehensive evaluation, we found that with N core types, the optimal HCMP for single-thread performance is comprised of an "average core" type coupled with N-1 "accelerator core" types that relieve distinct resource bottlenecks in the average core. This inspires a complementary steering algorithm in which a running program is continuously diagnosed for bottlenecks on the current core. If any are observed, the program is migrated to an accelerator core that relieves any of the bottlenecks and does not worsen any of them. If no accelerator core satisfies this condition, then the average core is selected. In our evaluation, we show that a 4-core-type HCMP improves single-thread performance up to 76% and 15% on average over a homogeneous chip multiprocessor, and our steering algorithm is able to capture most of this performance gain. Further, we show that our steering algorithm on a 4-core-type HCMP is, on average, 33% more power-efficient (BIPS3/watt) than a homogeneous chip multiprocessor.