The optimum pipeline depth for a microprocessor
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Optimizing pipelines for power and performance
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Optimum Power/Performance Pipeline Depth
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Design and reliability challenges in nanometer technologies
Proceedings of the 41st annual Design Automation Conference
Power-optimal pipelining in deep submicron technology
Proceedings of the 2004 international symposium on Low power electronics and design
Power efficiency for variation-tolerant multicore processors
Proceedings of the 2006 international symposium on Low power electronics and design
Impact of process variations on multicore performance symmetry
Proceedings of the conference on Design, automation and test in Europe
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
Characterizing chip-multiprocessor variability-tolerance
Proceedings of the 45th annual Design Automation Conference
Energy-efficient scheduling of real-time periodic tasks in multicore systems
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Designing for dark silicon: a methodological perspective on energy efficient systems
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Architecturally homogeneous power-performance heterogeneous multicore systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A measurement study of GPU DVFS on energy conservation
Proceedings of the Workshop on Power-Aware Computing and Systems
Improving platform energy: chip area trade-off in near-threshold computing environment
Proceedings of the International Conference on Computer-Aided Design
Hi-index | 0.01 |
Recently, processor manufacturers have integrated more than a hundred cores in a single die to deliver extremely high throughput for highly-parallel, data-intensive applications like physics simulations, 3D-graphics, etc. Meanwhile, excessive power consumption rather than silicon area will limit the performance of many-core processors running the aforementioned applications. In this paper, to optimize the total power of many-core processors, we analyze the impact of 1) the number of cores, 2) parallelism in applications, and 3) supply voltage scaling limit due to on-die memory failure at low supply voltage. Our analysis shows that doubling the number of cores with lower than nominal supply voltage offers the most cost-effective power reduction, resulting in up to 65% less power consumption for highly-parallel applications even when supply voltage scaling is limited to 0.7V. The reduced power, in turn, can be used to improve throughput at higher voltage in power-constrained many-core processors. Furthermore, we extend our analysis to consider within-die core-to-core frequency and leakage variations. When only a subset of cores in a many-core processor are to be chosen to achieve a demanded throughput, moderately fast and leaky cores always provide optimal power consumption. In addition, frequency-island clocking, which allows independent frequency for each core, leads to 7% less power consumption than global clocking, and it prefers the fastest core (among the chosen ones) to process the totally sequential portion of workload.