Power Limitations and Dark Silicon Challenge the Future of Multicore

Authors:
Hadi Esmaeilzadeh;Emily Blem;Renée St. Amant;Karthikeyan Sankaralingam;Doug Burger
Affiliations:
University of Washington;University of Wisconsin-Madison;The University of Texas at Austin;University of Wisconsin-Madison;Microsoft Research
Venue:
ACM Transactions on Computer Systems (TOCS)
Year:
2012

Citing 20
Cited 1

New microarchitecture challenges in the coming generations of CMOS process technologies (keynote address)(abstract only)

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Optimization of VDD and VTH for low-power and high speed applications

ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Composable Lightweight Processors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Corollaries to Amdahl's Law for Energy

IEEE Computer Architecture Letters
Amdahl's Law in the Multicore Era

Computer
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Validity of the single processor approach to achieving large scale computing capabilities

AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
Extending Amdahl's Law for Energy-Efficient Computing in the Many-Core Era

Computer
Accelerating critical section execution with asymmetric multi-core architectures

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
An Asymptotic Performance/Energy Analysis and Optimization of Multi-core Architectures

ICDCN '09 Proceedings of the 10th International Conference on Distributed Computing and Networking
Over-provisioned multicore systems

Over-provisioned multicore systems
Many-Core vs. Many-Thread Machines: Stay Away From the Valley

IEEE Computer Architecture Letters
Understanding PARSEC performance on contemporary CMPs

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Conservation cores: reducing the energy of mature computations

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis

Proceedings of the 37th annual international symposium on Computer architecture
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Looking back on the language and hardware revolutions: measured power, performance, and scaling

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Toward Dark Silicon in Servers

IEEE Micro

Toward application-specific memory reconfiguration for energy efficiency

E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Since 2004, processor designers have increased core counts to exploit Moore’s Law scaling, rather than focusing on single-core performance. The failure of Dennard scaling, to which the shift to multicore parts is partially a response, may soon limit multicore scaling just as single-core scaling has been curtailed. This paper models multicore scaling limits by combining device scaling, single-core scaling, and multicore scaling to measure the speedup potential for a set of parallel workloads for the next five technology generations. For device scaling, we use both the ITRS projections and a set of more conservative device scaling parameters. To model single-core scaling, we combine measurements from over 150 processors to derive Pareto-optimal frontiers for area/performance and power/performance. Finally, to model multicore scaling, we build a detailed performance model of upper-bound performance and lower-bound core power. The multicore designs we study include single-threaded CPU-like and massively threaded GPU-like multicore chip organizations with symmetric, asymmetric, dynamic, and composed topologies. The study shows that regardless of chip organization and topology, multicore scaling is power limited to a degree not widely appreciated by the computing community. Even at 22 nm (just one year from now), 21% of a fixed-size chip must be powered off, and at 8 nm, this number grows to more than 50%. Through 2024, only 7.9× average speedup is possible across commonly used parallel workloads for the topologies we study, leaving a nearly 24-fold gap from a target of doubled performance per generation.