Fast and accurate instruction fetch and branch prediction

Authors:
B. Calder;D. Grunwald
Affiliations:
Department of Computer Science, Campus Box 430, University of Colorado, Boulder, CO;Department of Computer Science, Campus Box 430, University of Colorado, Boulder, CO
Venue:
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Year:
1994

Citing 18
Cited 36

Reduced instruction set computer architectures for VLSI

Reduced instruction set computer architectures for VLSI
Reducing the cost of branches

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Branch folding in the CRISP microprocessor: reducing branch delay to zero

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Reducing the Branch Penalty in Pipelined Processors

Computer
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Branch history table prediction of moving target branches due to subroutine returns

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Strategies for branch target buffers

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Alternative implementations of two-level adaptive branch prediction

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Improving the accuracy of dynamic branch prediction using branch correlation

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Predicting conditional branch directions from previous runs of a program

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A comprehensive instruction fetch mechanism for a processor supporting speculative execution

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Branch prediction for free

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
A comparison of dynamic branch predictors that use two levels of branch history

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Reducing indirect function call overhead in C++ programs

POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
ATOM: a system for building customized program analysis tools

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Branch Target Buffer Design and Optimization

IEEE Transactions on Computers
A study of branch prediction strategies

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture

Reducing branch costs via branch alignment

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Corpus-based static branch prediction

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Next cache line and set prediction

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Optimization of instruction fetch mechanisms for high issue rates

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Instruction cache fetch policies for speculative execution

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A modified approach to data cache management

Proceedings of the 28th annual international symposium on Microarchitecture
Partial resolution in branch target buffers

Proceedings of the 28th annual international symposium on Microarchitecture
A system level perspective on branch architecture performance

Proceedings of the 28th annual international symposium on Microarchitecture
Correlation and aliasing in dynamic branch predictors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Evidence-based static branch prediction using machine learning

ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving the Accuracy of History-Based Branch Prediction

IEEE Transactions on Computers
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
The agree predictor: a mechanism for reducing negative branch history interference

Proceedings of the 24th annual international symposium on Computer architecture
Partial Resolution in Branch Target Buffers

IEEE Transactions on Computers
Modeled and Measured Instruction Fetching Performance for Superscalar Microprocessors

IEEE Transactions on Parallel and Distributed Systems
Improving prediction for procedure returns with return-address-stack repair mechanisms

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A scalable front-end architecture for fast instruction delivery

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques

IEEE Transactions on Computers
Software profiling for hot path prediction: less is more

ACM SIGPLAN Notices
Optimizations Enabled by a Decoupled Front-End Architecture

IEEE Transactions on Computers
Software profiling for hot path prediction: less is more

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Cool-cache for hot multimedia

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A Comprehensive Analysis of Indirect Branch Prediction

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Fetching instruction streams

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Performance Modeling Using Object-Oriented Execution-Driven Simulation}

SS '96 Proceedings of the 29th Annual Simulation Symposium (SS '96)
Cool-Cache: A compiler-enabled energy efficient data caching framework for embedded/multimedia processors

ACM Transactions on Embedded Computing Systems (TECS)
Power-Aware Branch Prediction: Characterization and Design

IEEE Transactions on Computers
A low-complexity fetch architecture for high-performance superscalar processors

ACM Transactions on Architecture and Code Optimization (TACO)
Energy-efficient and high-performance instruction fetch using a block-aware ISA

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Block-aware instruction set architecture

ACM Transactions on Architecture and Code Optimization (TACO)
Wide and efficient trace prediction using the local trace predictor

Proceedings of the 20th annual international conference on Supercomputing
Pipeline spectroscopy

Proceedings of the 2007 workshop on Experimental computer science
Pipeline spectroscopy

ecs'07 Experimental computer science on Experimental computer science
Improving instruction delivery with a block-aware ISA

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Reducing instruction fetch energy in multi-issue processors

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.01

Visualization

Abstract

Accurate branch prediction is critical to performance; mispredicted branches mean that ten's of cycles may be wasted in superscalar architectures. Architectures combining very effective branch prediction mechanisms coupled with modified branch target buffers (BTB's) have been proposed for wide-issue processors. These mechanisms require considerable processor resources. Concurrently, the larger address space of 64-bit architectures introduce new obstacles and opportunities. A larger address space means branch target buffers become more expensive. In this paper, we show how a combination of less expensive mechanisms can achieve better performance than BTB's. This combination relies on a number of design choices described in the paper. We used trace-driven simulation to show that our proposed design, which uses fewer resources, offers better performance than previously proposed alternatives for most programs, and indicate how to further improve this design.