Dark silicon and the end of multicore scaling

Authors:
Hadi Esmaeilzadeh;Emily Blem;Renee St. Amant;Karthikeyan Sankaralingam;Doug Burger
Affiliations:
University of Washington, Seattle, WA, USA;University of Wisconsin-Madison, Madison, WI, USA;The University of Texas at Austin, Austin, TX, USA;University of Wisconsin-Madison, Madison, WI, USA;Microsoft Research, Seattle, WA, USA
Venue:
Proceedings of the 38th annual international symposium on Computer architecture
Year:
2011

Citing 19
Cited 75

Optimization of VDD and VTH for low-power and high speed applications

ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Thousand core chips: a technology perspective

Proceedings of the 44th annual Design Automation Conference
Composable Lightweight Processors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Corollaries to Amdahl's Law for Energy

IEEE Computer Architecture Letters
Amdahl's Law in the Multicore Era

Computer
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Validity of the single processor approach to achieving large scale computing capabilities

AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
Extending Amdahl's Law for Energy-Efficient Computing in the Many-Core Era

Computer
Accelerating critical section execution with asymmetric multi-core architectures

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
An Asymptotic Performance/Energy Analysis and Optimization of Multi-core Architectures

ICDCN '09 Proceedings of the 10th International Conference on Distributed Computing and Networking
Over-provisioned multicore systems

Over-provisioned multicore systems
Many-Core vs. Many-Thread Machines: Stay Away From the Valley

IEEE Computer Architecture Letters
Understanding PARSEC performance on contemporary CMPs

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Conservation cores: reducing the energy of mature computations

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis

Proceedings of the 37th annual international symposium on Computer architecture
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Looking back on the language and hardware revolutions: measured power, performance, and scaling

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems

Understanding sources of ineffciency in general-purpose chips

Communications of the ACM
Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Clearing the clouds: a study of emerging scale-out workloads on modern hardware

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Chameleon: operating system support for dynamic processors

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Architecture support for disciplined approximate programming

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Idempotent processor architecture

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
QsCores: trading dark silicon for scalable energy efficiency with quasi-specific cores

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Looking back and looking forward: power, performance, and upheaval

Communications of the ACM
Is dark silicon useful?: harnessing the four horsemen of the coming dark silicon apocalypse

Proceedings of the 49th Annual Design Automation Conference
Assessing the performance limits of parallelized near-threshold computing

Proceedings of the 49th Annual Design Automation Conference
Near-threshold voltage (NTV) design: opportunities and challenges

Proceedings of the 49th Annual Design Automation Conference
Amdahl's law for predicting the future of multicores considered harmful

ACM SIGARCH Computer Architecture News
Dynamic adaptive virtual core mapping to improve power, energy, and performance in multi-socket multicores

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Design benchmarking to 7nm with FinFET predictive technology models

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Designing for dark silicon: a methodological perspective on energy efficient systems

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Evaluation of voltage stacking for near-threshold multicore computing

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
A defect-tolerant accelerator for emerging high-performance applications

Proceedings of the 39th Annual International Symposium on Computer Architecture
Operating systems should manage accelerators

HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Hardware acceleration in the IBM PowerEN processor: architecture and performance

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Workload and power budget partitioning for single-chip heterogeneous processors

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Potentia est scientia: security and privacy implications of energy-proportional computing

HotSec'12 Proceedings of the 7th USENIX conference on Hot Topics in Security
When less is more (LIMO):controlled parallelism forimproved efficiency

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Don't burn your mobile!: safe computational re-sprinting via model predictive control

Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

ACM Transactions on Computer Systems (TOCS)
Power efficiency for software algorithms running on graphics processors

EGGH-HPG'12 Proceedings of the Fourth ACM SIGGRAPH / Eurographics conference on High-Performance Graphics
Exploring multi-threaded Java application performance on multicore hardware

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
The implications of shared data synchronization techniques on multi-core energy efficiency

HotPower'12 Proceedings of the 2012 USENIX conference on Power-Aware Computing and Systems
Energy consumption modeling for hybrid computing

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Location, location, location: the role of spatial locality in asymptotic energy minimization

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Elastic CGRAs

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Low power cache architectures with hybrid approach of filtering unnecessary way accesses

Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Computational sprinting on a hardware/software testbed

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Cache-Conscious Wavefront Scheduling

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Predicting Coherence Communication by Tracking Synchronization Points at Run Time

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Neural Acceleration for General-Purpose Approximate Programs

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A hardware unit for fast SAH-optimised BVH construction

ACM Transactions on Graphics (TOG) - SIGGRAPH 2013 Conference Proceedings
A general constraint-centric scheduling framework for spatial architectures

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Software-defined massive multicore networking via freespace optical interconnect

Proceedings of the ACM International Conference on Computing Frontiers
Weak heterogeneity as a way of adapting multicores to real workloads

Proceedings of the 3rd International Workshop on Adaptive Self-Tuning Computing Systems
Cherry-picking: exploiting process variations in dark-silicon homogeneous chip multi-processors

Proceedings of the Conference on Design, Automation and Test in Europe
Future of GPGPU micro-architectural parameters

Proceedings of the Conference on Design, Automation and Test in Europe
Resource efficient computing for warehouse-scale datacenters

Proceedings of the Conference on Design, Automation and Test in Europe
Mitigating dark-silicon problems using superlattice-based thermoelectric coolers

Proceedings of the Conference on Design, Automation and Test in Europe
D-MRAM cache: enhancing energy efficiency with 3T-1MTJ DRAM/MRAM hybrid memory

Proceedings of the Conference on Design, Automation and Test in Europe
Continuous real-world inputs can open up alternative accelerator designs

Proceedings of the 40th Annual International Symposium on Computer Architecture
Flicker: a dynamically adaptive architecture for power limited multicore systems

Proceedings of the 40th Annual International Symposium on Computer Architecture
LINQits: big data on little clients

Proceedings of the 40th Annual International Symposium on Computer Architecture
Runtime resource allocation for software pipelines

Proceedings of the 16th International Workshop on Software and Compilers for Embedded Systems
The ITRS design technology and system drivers roadmap: process and status

Proceedings of the 50th Annual Design Automation Conference
Lighting the dark silicon by exploiting heterogeneity on future processors

Proceedings of the 50th Annual Design Automation Conference
Power gating applied to MP-SoCs for standby-mode power management

Proceedings of the 50th Annual Design Automation Conference
HaDeS: architectural synthesis for heterogeneous dark silicon chip multi-processors

Proceedings of the 50th Annual Design Automation Conference
Hierarchical power management for asymmetric multi-core in dark silicon era

Proceedings of the 50th Annual Design Automation Conference
Systematic evaluation of workload clustering for extremely energy-efficient architectures

ACM SIGARCH Computer Architecture News
Low-latency adaptive mode transitions and hierarchical power management in asymmetric clustered cores

ACM Transactions on Architecture and Code Optimization (TACO)
Coordinated power-performance optimization in manycores

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
A unified view of non-monotonic core selection and application steering in heterogeneous chip multiprocessors

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
UWB microwave imaging for breast cancer detection: Many-core, GPU, or FPGA?

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hyper-switch: a scalable software virtual switching architecture

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Modeling the effects of DFS on power consumption in hybrid chip multiprocessors

E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing
Asymmetric scaling on network packet processors in the dark silicon era

ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
Divergence-aware warp scheduling

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Meet the walkers: accelerating index traversals for in-memory databases

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
OmpSs@Zynq all-programmable SoC ecosystem

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Wordwidth, instructions, looping, and virtualization: the role of sharing in absolute energy minimization

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Selecting representative benchmark inputs for exploring microprocessor design spaces

ACM Transactions on Architecture and Code Optimization (TACO)
Accelerating an application domain with specialized functional units

ACM Transactions on Architecture and Code Optimization (TACO)
TornadoNoC: A lightweight and scalable on-chip network architecture for the many-core era

ACM Transactions on Architecture and Code Optimization (TACO)
Synthesis-friendly techniques for tightly-coupled integration of hardware accelerators into shared-memory multi-core clusters

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
A generalized software framework for accurate and efficient management of performance goals

Proceedings of the Eleventh ACM International Conference on Embedded Software
Fast and accurate power estimation method based on a PMU counter

Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Energy-efficient execution of dense linear algebra algorithms on multi-core processors

Cluster Computing
Analytical modeling of energy efficiency in heterogeneous processors

Computers and Electrical Engineering
Understanding, modelling, and improving the performance of web applications in multicore virtualised environments

Proceedings of the 5th ACM/SPEC international conference on Performance engineering

Quantified Score

Hi-index	0.03

Visualization

Abstract

Since 2005, processor designers have increased core counts to exploit Moore's Law scaling, rather than focusing on single-core performance. The failure of Dennard scaling, to which the shift to multicore parts is partially a response, may soon limit multicore scaling just as single-core scaling has been curtailed. This paper models multicore scaling limits by combining device scaling, single-core scaling, and multicore scaling to measure the speedup potential for a set of parallel workloads for the next five technology generations. For device scaling, we use both the ITRS projections and a set of more conservative device scaling parameters. To model single-core scaling, we combine measurements from over 150 processors to derive Pareto-optimal frontiers for area/performance and power/performance. Finally, to model multicore scaling, we build a detailed performance model of upper-bound performance and lower-bound core power. The multicore designs we study include single-threaded CPU-like and massively threaded GPU-like multicore chip organizations with symmetric, asymmetric, dynamic, and composed topologies. The study shows that regardless of chip organization and topology, multicore scaling is power limited to a degree not widely appreciated by the computing community. Even at 22 nm (just one year from now), 21% of a fixed-size chip must be powered off, and at 8 nm, this number grows to more than 50%. Through 2024, only 7.9x average speedup is possible across commonly used parallel workloads, leaving a nearly 24-fold gap from a target of doubled performance per generation.