Two-level adaptive training branch prediction
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Improving the accuracy of dynamic branch prediction using branch correlation
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Fast and accurate instruction fetch and branch prediction
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Next cache line and set prediction
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Alternative implementations of hybrid branch predictors
Proceedings of the 28th annual international symposium on Microarchitecture
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Confidence estimation for speculation control
Proceedings of the 25th annual international symposium on Computer architecture
Pipeline gating: speculation control for energy reduction
Proceedings of the 25th annual international symposium on Computer architecture
Improving prediction for procedure returns with return-address-stack repair mechanisms
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
The impact of delay on the design of branch predictors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Power and energy reduction via pipeline balancing
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Managing leakage for transient data: decay and quasi-static 4T memory cells
Proceedings of the 2002 international symposium on Low power electronics and design
A study of branch prediction strategies
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Predictive sequential associative cache
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Multiple Branch and Block Prediction
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
The Alpha 21264 Microprocessor Architecture
ICCD '98 Proceedings of the International Conference on Computer Design
Applying Decay Strategies to Branch Predictors for Leakage Energy Savings
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Power Issues Related to Branch Prediction
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Design of a Predictive Filter Cache for Energy Savings in High Performance Processor Architectures
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Power Issues Related To Branch Prediction
Power Issues Related To Branch Prediction
Merging path and gshare indexing in perceptron branch prediction
ACM Transactions on Architecture and Code Optimization (TACO)
Lazy BTB: reduce BTB energy consumption using dynamic profiling
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Branchless cycle prediction for embedded processors
Proceedings of the 2006 ACM symposium on Applied computing
Thrifty BTB: A comprehensive solution for dynamic power reduction in branch target buffers
Microprocessors & Microsystems
Profile-based dynamic pipeline scaling
The Journal of Supercomputing
Reducing leakage power with BTB access prediction
Integration, the VLSI Journal
Architecture level design space exploration of superscalar processor for multimedia applications
SPECTS'09 Proceedings of the 12th international conference on Symposium on Performance Evaluation of Computer & Telecommunication Systems
Compiler support for dynamic pipeline scaling
EUC'07 Proceedings of the 2007 international conference on Embedded and ubiquitous computing
WHOLE: a low energy I-cache with separate way history
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Branch target buffer design for embedded processors
Microprocessors & Microsystems
Power-aware BTB for modern processors
Computers and Electrical Engineering
Low power branch prediction for embedded application processors
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Enhancing data center sustainability through energy-adaptive computing
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Hi-index | 14.98 |
Abstract--This paper uses Wattch and the SPEC 2000 integer and floating-point benchmarks to explore the role of branch predictor organization in power/energy/performance trade offs for processor design. Even though the direction predictor by itself represents less than 1 percent of the processor's total power dissipation, prediction accuracy is nevertheless a powerful lever on processor behavior and program execution time. A thorough study of branch predictor organizations shows that, as a general rule, to reduce overall energy consumption in the processor, it is worthwhile to spend more power in the branch predictor if this results in more accurate predictions that improve running time. This not only improves performance, but can also improve the energy-delay product by up to 20 percent. Three techniques, however, can reduce power dissipation without harming accuracy. Banking reduces the portion of the branch predictor that is active at any one time. A new on-chip structure, the prediction probe detector (PPD), uses predecode bits to entirely eliminate unnecessary predictor and branch target buffer (BTB) accesses. Despite the extra power that must be spent accessing it, the PPD reduces local predictor power and energy dissipation by about 31 percent and overall processor power and energy dissipation by 3 percent. These savings can be further improved by using profiling to annotate branches, identifying those that are highly biased and do not require static prediction. Finally, the paper explores the effectiveness of a previously proposed technique, pipeline gating, and finds that, even with adaptive control based on recent predictor accuracy, pipeline gating yields little or no energy savings.