MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Run-time power estimation in high performance microprocessors
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Proceedings of the 30th annual international symposium on Computer architecture
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Proceedings of the 31st annual international symposium on Computer architecture
Millicode in an IBM zSeries processor
IBM Journal of Research and Development
Mitigating Amdahl's Law through EPI Throttling
Proceedings of the 32nd annual international symposium on Computer Architecture
The Impact of Performance Asymmetry in Emerging Multicore Architectures
Proceedings of the 32nd annual international symposium on Computer Architecture
Power prediction for intel XScale® processors using performance monitoring unit events
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Dynamic thread assignment on heterogeneous multiprocessor architectures
Proceedings of the 3rd conference on Computing frontiers
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Core architecture optimization for heterogeneous chip multiprocessors
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
A Flexible Heterogeneous Multi-Core Architecture
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Composable Lightweight Processors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient operating system scheduling for performance-asymmetric multi-core architectures
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Amdahl's Law in the Multicore Era
Computer
HASS: a scheduler for heterogeneous multicore systems
ACM SIGOPS Operating Systems Review
Real time power estimation and thread scheduling via performance counters
ACM SIGARCH Computer Architecture News
Efficient program scheduling for heterogeneous multi-core processors
Proceedings of the 46th Annual Design Automation Conference
Core-Selectability in Chip Multiprocessors
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Bias scheduling in heterogeneous multi-core architectures
Proceedings of the 5th European conference on Computer systems
A comprehensive scheduler for asymmetric multicore systems
Proceedings of the 5th European conference on Computer systems
A self-adaptive scheduler for asymmetric multi-cores
Proceedings of the 20th symposium on Great lakes symposium on VLSI
WiDGET: Wisconsin decoupled grid execution tiles
Proceedings of the 37th annual international symposium on Computer architecture
Forwardflow: a scalable core for power-constrained CMPs
Proceedings of the 37th annual international symposium on Computer architecture
Scalable thread scheduling and global power management for heterogeneous many-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the Conference on Design, Automation and Test in Europe
Erasing Core Boundaries for Robust and Configurable Performance
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Efficient interaction between OS and architecture in heterogeneous platforms
ACM SIGOPS Operating Systems Review
Performance Per Watt Benefits of Dynamic Core Morphing in Asymmetric Multicores
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Microvisor: a runtime architecture for thermal management in chip multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers IV
An opportunistic prediction-based thread scheduling to maximize throughput/watt in AMPs
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
Asymmetric multi-core processors (AMPs) have been shown to outperform symmetric ones in terms of performance and performance/watt. Improved performance and power efficiency are achieved when the program threads are matched to their most suitable cores. Since the computational needs of a program may change during its execution, the best thread to core assignment will likely change with time. We have, therefore, developed an online program phase classification scheme that allows the swapping of threads when the current needs of the threads justify a change in the assignment. The architectural differences among the cores in an AMP can never match the diversity that exists among different programs and even between different phases of the same program. Consider, for example, a program (or a program phase) that has a high instruction-level parallelism (ILP) and will exhibit high power efficiency if executed on a powerful core. We can not, however, include such powerful cores in the designed AMP, since they will remain underutilized most of the time, and they are not power efficient when the programs do not exhibit a high degree of ILP. Thus, we must expect to see program phases where the designed cores will be unable to support the ILP that the program can exhibit. We, therefore, propose in this article a dynamic morphing scheme. This scheme will allow a core to gain control of a functional unit that is ordinarily under the control of a neighboring core during periods of intense computation with high ILP. This way, we dynamically adjust the hardware resources to the current needs of the application. Our results show that combining online phase classification and dynamic core morphing can significantly improve the performance/watt of most multithreaded workloads.