Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
ET2: a metric for time and energy efficiency of computation
Power aware computing
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Proceedings of the 31st annual international symposium on Computer architecture
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Best of Both Latency and Throughput
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Speculative Execution In High Performance Computer Architectures (Chapman & Hall/Crc Computer & Information Science Series)
Dynamic thread assignment on heterogeneous multiprocessor architectures
Proceedings of the 3rd conference on Computing frontiers
Core architecture optimization for heterogeneous chip multiprocessors
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Illustrative Design Space Studies with Microarchitectural Regression Models
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Measuring Program Similarity: Experiments with SPEC CPU Benchmark Suites
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Amdahl's Law in the Multicore Era
Computer
Accelerating critical section execution with asymmetric multi-core architectures
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
HASS: a scheduler for heterogeneous multicore systems
ACM SIGOPS Operating Systems Review
Configurational Workload Characterization
ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
Efficient program scheduling for heterogeneous multi-core processors
Proceedings of the 46th Annual Design Automation Conference
Core-Selectability in Chip Multiprocessors
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Bias scheduling in heterogeneous multi-core architectures
Proceedings of the 5th European conference on Computer systems
A comprehensive scheduler for asymmetric multicore systems
Proceedings of the 5th European conference on Computer systems
Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis
Proceedings of the 37th annual international symposium on Computer architecture
Scalable thread scheduling and global power management for heterogeneous many-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Criticality-driven superscalar design space exploration
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Dynamic voltage and frequency scaling: the laws of diminishing returns
HotPower'10 Proceedings of the 2010 international conference on Power aware computing and systems
A Predictive Model for Dynamic Microarchitectural Adaptivity Control
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 38th annual international symposium on Computer architecture
Dark silicon and the end of multicore scaling
Proceedings of the 38th annual international symposium on Computer architecture
Phase-Guided Scheduling on Single-ISA Heterogeneous Multicore Processors
DSD '11 Proceedings of the 2011 14th Euromicro Conference on Digital System Design
Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Performance Per Watt Benefits of Dynamic Core Morphing in Asymmetric Multicores
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Phase-based tuning for better utilization of performance-asymmetric multicore processors
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Scheduling heterogeneous multi-cores through Performance Impact Estimation (PIE)
Proceedings of the 39th Annual International Symposium on Computer Architecture
Composite Cores: Pushing Heterogeneity Into a Core
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
A single-ISA heterogeneous chip multiprocessor (HCMP) is an attractive substrate to improve single-thread performance and energy efficiency in the dark silicon era. We consider HCMPs comprised of non-monotonic core types where each core type is performance-optimized to different instruction-level behavior and hence cannot be ranked -- different program phases achieve their highest performance on different cores. Although non-monotonic heterogeneous designs offer higher performance potential than either monotonic heterogeneous designs or homogeneous designs, steering applications to the best-performing core is challenging due to performance ambiguity of core types. In this paper, we present a unified view of selecting non-monotonic core types at design-time and steering program phases to cores at run-time. After comprehensive evaluation, we found that with N core types, the optimal HCMP for single-thread performance is comprised of an "average core" type coupled with N-1 "accelerator core" types that relieve distinct resource bottlenecks in the average core. This inspires a complementary steering algorithm in which a running program is continuously diagnosed for bottlenecks on the current core. If any are observed, the program is migrated to an accelerator core that relieves any of the bottlenecks and does not worsen any of them. If no accelerator core satisfies this condition, then the average core is selected. In our evaluation, we show that a 4-core-type HCMP improves single-thread performance up to 76% and 15% on average over a homogeneous chip multiprocessor, and our steering algorithm is able to capture most of this performance gain. Further, we show that our steering algorithm on a 4-core-type HCMP is, on average, 33% more power-efficient (BIPS3/watt) than a homogeneous chip multiprocessor.