Mitigating Amdahl's Law through EPI Throttling
Proceedings of the 32nd annual international symposium on Computer Architecture
The Impact of Performance Asymmetry in Emerging Multicore Architectures
Proceedings of the 32nd annual international symposium on Computer Architecture
Heterogeneous Chip Multiprocessors
Computer
Power-performance considerations of parallel computing on chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Multiple Instruction Stream Processor
Proceedings of the 33rd annual international symposium on Computer Architecture
Core architecture optimization for heterogeneous chip multiprocessors
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Design space exploration for multicore architectures: a power/performance/thermal view
Proceedings of the 20th annual international conference on Supercomputing
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Enabling scalability and performance in a large scale CMP environment
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Evaluating design tradeoffs in on-chip power management for CMPs
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Thermal-aware scheduling for future chip multiprocessors
EURASIP Journal on Embedded Systems
Hiding the misprediction penalty of a resource-efficient high-performance processor
ACM Transactions on Architecture and Code Optimization (TACO)
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Exploring power management in multi-core systems
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Multitasking workload scheduling on flexible-core chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Low-complexity policies for energy-performance tradeoff in chip-multi-processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Multiple clock and voltage domains for chip multi processors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Conservation cores: reducing the energy of mature computations
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
LRU-PEA: a smart replacement policy for non-uniform cache architectures on chip multiprocessors
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
The auction: optimizing banks usage in Non-Uniform Cache Architectures
Proceedings of the 24th ACM International Conference on Supercomputing
Area-efficient floorplans and interconnects for homogeneous multi-core architectures
International Journal of High Performance Systems Architecture
Understanding throughput-oriented architectures
Communications of the ACM
Criticality-driven superscalar design space exploration
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Federation: Boosting per-thread performance of throughput-oriented manycore architectures
ACM Transactions on Architecture and Code Optimization (TACO)
The migration prefetcher: Anticipating data promotion in dynamic NUCA caches
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Phase-based tuning for better utilization of performance-asymmetric multicore processors
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Understanding fundamental design choices in single-ISA heterogeneous multicore architectures
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
KnightShift: Scaling the Energy Proportionality Wall through Server-Level Heterogeneity
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Replacement techniques for dynamic NUCA cache designs on CMPs
The Journal of Supercomputing
ACM Transactions on Architecture and Code Optimization (TACO)
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.02 |
This paper describes the tradeoff between latency performance and throughput performance in a power-constrained environment. We show that the key to achieving both excellent latency performance as well as excellent throughput performance is to dynamically vary the amount of energy expended to process instructions according to the amount of parallelism available in the software. We survey four techniques for achieving variable energy per instruction: voltage/frequency scaling, asymmetric cores, variable-size cores, and speculation control. We estimate the potential range of energies obtainable by each technique and conclude that a combination of asymmetric cores and voltage/frequency scaling offers the most promising approach to designing a chip-level multiprocessor that can achieve both excellent latency performance and excellent throughput performance.