Power-Aware Branch Prediction: Characterization and Design

Authors:
Dharmesh Parikh;Kevin Skadron;Yan Zhang;Mircea Stan
Affiliations:
-;-;-;-
Venue:
IEEE Transactions on Computers
Year:
2004

Citing 26
Cited 14

Two-level adaptive training branch prediction

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Improving the accuracy of dynamic branch prediction using branch correlation

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Fast and accurate instruction fetch and branch prediction

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Next cache line and set prediction

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Alternative implementations of hybrid branch predictors

Proceedings of the 28th annual international symposium on Microarchitecture
The filter cache: an energy efficient memory structure

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
Confidence estimation for speculation control

Proceedings of the 25th annual international symposium on Computer architecture
Pipeline gating: speculation control for energy reduction

Proceedings of the 25th annual international symposium on Computer architecture
Improving prediction for procedure returns with return-address-stack repair mechanisms

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
The impact of delay on the design of branch predictors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Power and energy reduction via pipeline balancing

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Managing leakage for transient data: decay and quasi-static 4T memory cells

Proceedings of the 2002 international symposium on Low power electronics and design
Access-Mode Predictions for Low-Power Cache Design

IEEE Micro
A study of branch prediction strategies

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Predictive sequential associative cache

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Multiple Branch and Block Prediction

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
A Taxonomy of Branch Mispredictions, and Alloyed Prediction as a Robust Solution to Wrong-History Mispredictions

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
The Alpha 21264 Microprocessor Architecture

ICCD '98 Proceedings of the International Conference on Computer Design
Applying Decay Strategies to Branch Predictors for Leakage Energy Savings

ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Power Issues Related to Branch Prediction

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Design of a Predictive Filter Cache for Energy Savings in High Performance Processor Architectures

ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Power Issues Related To Branch Prediction

Power Issues Related To Branch Prediction

Merging path and gshare indexing in perceptron branch prediction

ACM Transactions on Architecture and Code Optimization (TACO)
Lazy BTB: reduce BTB energy consumption using dynamic profiling

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Branchless cycle prediction for embedded processors

Proceedings of the 2006 ACM symposium on Applied computing
Thrifty BTB: A comprehensive solution for dynamic power reduction in branch target buffers

Microprocessors & Microsystems
Profile-based dynamic pipeline scaling

The Journal of Supercomputing
Reducing leakage power with BTB access prediction

Integration, the VLSI Journal
Architecture level design space exploration of superscalar processor for multimedia applications

SPECTS'09 Proceedings of the 12th international conference on Symposium on Performance Evaluation of Computer & Telecommunication Systems
Compiler support for dynamic pipeline scaling

EUC'07 Proceedings of the 2007 international conference on Embedded and ubiquitous computing
WHOLE: a low energy I-cache with separate way history

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Branch target buffer design for embedded processors

Microprocessors & Microsystems
Power-aware BTB for modern processors

Computers and Electrical Engineering
Low power branch prediction for embedded application processors

Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Reducing the branch power cost in embedded processors through static scheduling, profiling and superblock formation

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Enhancing data center sustainability through energy-adaptive computing

ACM Journal on Emerging Technologies in Computing Systems (JETC)

Quantified Score

Hi-index	14.98

Visualization

Abstract

Abstract--This paper uses Wattch and the SPEC 2000 integer and floating-point benchmarks to explore the role of branch predictor organization in power/energy/performance trade offs for processor design. Even though the direction predictor by itself represents less than 1 percent of the processor's total power dissipation, prediction accuracy is nevertheless a powerful lever on processor behavior and program execution time. A thorough study of branch predictor organizations shows that, as a general rule, to reduce overall energy consumption in the processor, it is worthwhile to spend more power in the branch predictor if this results in more accurate predictions that improve running time. This not only improves performance, but can also improve the energy-delay product by up to 20 percent. Three techniques, however, can reduce power dissipation without harming accuracy. Banking reduces the portion of the branch predictor that is active at any one time. A new on-chip structure, the prediction probe detector (PPD), uses predecode bits to entirely eliminate unnecessary predictor and branch target buffer (BTB) accesses. Despite the extra power that must be spent accessing it, the PPD reduces local predictor power and energy dissipation by about 31 percent and overall processor power and energy dissipation by 3 percent. These savings can be further improved by using profiling to annotate branches, identifying those that are highly biased and do not require static prediction. Finally, the paper explores the effectiveness of a previously proposed technique, pipeline gating, and finds that, even with adaptive control based on recent predictor accuracy, pipeline gating yields little or no energy savings.