Power Issues Related to Branch Prediction

Authors:
Dharmesh Parikh;Kevin Skadron;Yan Zhang;Marco Barcella;Mircea R. Stan
Affiliations:
-;-;-;-;-
Venue:
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Year:
2002

Citing 0
Cited 47

A microprocessor survey course for learning advanced computer architecture

SIGCSE '02 Proceedings of the 33rd SIGCSE technical symposium on Computer science education
Understanding and improving operating system effects in control flow prediction

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Branch prediction techniques for low-power VLIW processors

Proceedings of the 13th ACM Great Lakes symposium on VLSI
Generating physical addresses directly for saving instruction TLB energy

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Power-Aware Control Speculation through Selective Throttling

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
MisSPECulation: partial and misleading use of SPEC CPU2000 in computer architecture conferences

Proceedings of the 30th annual international symposium on Computer architecture
Microprocessor pipeline energy analysis

Proceedings of the 2003 international symposium on Low power electronics and design
Branch prediction on demand: an energy-efficient solution

Proceedings of the 2003 international symposium on Low power electronics and design
Power-Aware Branch Prediction: Characterization and Design

IEEE Transactions on Computers
Scheduling Reusable Instructions for Power Reduction

Proceedings of the conference on Design, automation and test in Europe - Volume 1
Evaluation and choice of various branch predictors for low-power embedded processor

Journal of Computer Science and Technology
Combining compiler and runtime IPC predictions to reduce energy in next generation architectures

Proceedings of the 1st conference on Computing frontiers
Power-aware branch prediction techniques: a compiler-hints based approach for VLIW processors

Proceedings of the 14th ACM Great Lakes symposium on VLSI
Implementing branch-predictor decay using quasi-static memory cells

ACM Transactions on Architecture and Code Optimization (TACO)
SEPAS: a highly accurate energy-efficient branch predictor

Proceedings of the 2004 international symposium on Low power electronics and design
Alloyed branch history: combining global and local branch history for robust performance

International Journal of Parallel Programming
Loop-based leakage control for branch predictors

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Optimizing instruction TLB energy using software and hardware techniques

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Low-power branch prediction techniques for VLIW architectures: a compiler-hints based approach

Integration, the VLSI Journal - Special issue: ACM great lakes symposium on VLSI
Improving branch prediction accuracy with parallel conservative correctors

Proceedings of the 2nd conference on Computing frontiers
Energy-efficient and high-performance instruction fetch using a block-aware ISA

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Control Speculation for Energy-Efficient Next-Generation Superscalar Processors

IEEE Transactions on Computers
A case for a complexity-effective, width-partitioned microarchitecture

ACM Transactions on Architecture and Code Optimization (TACO)
Block-aware instruction set architecture

ACM Transactions on Architecture and Code Optimization (TACO)
Power efficient branch prediction through early identification of branch addresses

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Evaluating trace cache energy efficiency

ACM Transactions on Architecture and Code Optimization (TACO)
Reducing branch predictor leakage energy by exploiting loops

ACM Transactions on Embedded Computing Systems (TECS) - SPECIAL ISSUE SCOPES 2005
Computational and storage power optimizations for the O-GEHL branch predictor

Proceedings of the 4th international conference on Computing frontiers
Partial resolution for redundant operation table

Microprocessors & Microsystems
Thrifty BTB: A comprehensive solution for dynamic power reduction in branch target buffers

Microprocessors & Microsystems
Fetch Gating Control through Speculative Instruction Window Weighting

Transactions on High-Performance Embedded Architectures and Compilers II
Guaranteeing instruction fetch behavior with a lookahead instruction fetch engine (LIFE)

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Low-power branch prediction techniques for VLIW architectures: a compiler-hints based approach

Integration, the VLSI Journal - Special issue: ACM great lakes symposium on VLSI
Fetch gating control through speculative instruction window weighting

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
LPA: a first approach to the loop processor architecture

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Decomposable and responsive power models for multicore processors using performance counters

Proceedings of the 24th ACM International Conference on Supercomputing
Branch target buffer design for embedded processors

Microprocessors & Microsystems
Low power branch prediction for embedded application processors

Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Extending the cell SPE with energy efficient branch prediction

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Reducing the branch power cost in embedded processors through static scheduling, profiling and superblock formation

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Static techniques to improve power efficiency of branch predictors

HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Power-aware branch logic: a hardware based technique for filtering access to branch logic

SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Identifying and predicting timing-critical instructions to boost timing speculation

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Exploring the potential of architecture-level power optimizations

PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Design space exploration of hybrid ultra low power branch predictors

ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
A power-aware alternative for the perceptron branch predictor

ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Reducing instruction fetch energy in multi-issue processors

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper explores the role of branch predictor organization in power/energy/performance tradeoffs for processor design. We find that as a general rule, to reduce overall energy consumption in the processor it is worthwhile to spend more power in the branch predictor if this results in more accurate predictions that improve running time. Two techniques, however, provide substantial reductions in power dissipation without harming accuracy. Banking reduces the portion of the branch predictor that is active at any one time. And a new on-chip structure, the prediction probe detector (PPD), can use pre-decode bits to entirely eliminate unnecessary predictor and BTB accesses. Despite the extra power that must be spent accessing the PPD, it reduces local predictor power and energy dissipation by about 45% and overall processor power and energy dissipation by 5--6%.