Completion time multiple branch prediction for enhancing trace cache performance

Authors:
Ryan Rakvic;Bryan Black;John Paul Shen
Affiliations:
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA;Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA;Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA
Venue:
Proceedings of the 27th annual international symposium on Computer architecture
Year:
2000

Citing 17
Cited 8

Alternative implementations of two-level adaptive branch prediction

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A comparison of dynamic branch predictors that use two levels of branch history

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Increasing the instruction fetch rate via multiple branch prediction and a branch address cache

ICS '93 Proceedings of the 7th international conference on Supercomputing
Next cache line and set prediction

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Optimization of instruction fetch mechanisms for high issue rates

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic path-based branch correlation

Proceedings of the 28th annual international symposium on Microarchitecture
Control flow prediction with tree-like subgraphs for superscalar processors

Proceedings of the 28th annual international symposium on Microarchitecture
Multiple-block ahead branch predictors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Path-based next trace prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving trace cache effectiveness with branch promotion and trace packing

Proceedings of the 25th annual international symposium on Computer architecture
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Evaluation of Design Options for the Trace Cache Fetch Mechanism

IEEE Transactions on Computers - Special issue on cache memory and related problems
The block-based trace cache

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
The PowerPC 604 RISC microprocessor

IEEE Micro
The PowerPC User Instruction Set Architecture

IEEE Micro

Instruction path coprocessors

Proceedings of the 27th annual international symposium on Computer architecture
PipeRench implementation of the instruction path coprocessor

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
On Augmenting Trace Cache for High-Bandwidth Value Prediction

IEEE Transactions on Computers
Performance Evaluation of Exception Handling in I/O Libraries

DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Improving branch prediction accuracy with parallel conservative correctors

Proceedings of the 2nd conference on Computing frontiers
Wide and efficient trace prediction using the local trace predictor

Proceedings of the 20th annual international conference on Supercomputing
Using NAND flash memory for executing large volume real-time programs in automotive embedded systems

EMSOFT '10 Proceedings of the tenth ACM international conference on Embedded software
CoreSymphony architecture

Proceedings of the 9th conference on Computing Frontiers

Quantified Score

Hi-index	0.00

Visualization

Abstract

The need for multiple branch prediction is inherent to wide instruction fetching. This paper presents a completion time multiple branch predictor called the Tree-based Multiple Branch Predictor (TMP) that builds on previous single branch prediction techniques. It employs a tree structure of branch predictors, or tree-node predictors, and achieves accurate multiple branch prediction by leveraging the high accuracies of the individual branch predictors. A highly-efficient TMP design uses the 2-bit saturating counters for the tree-node predictors. To achieve higher prediction rate, the TMP employs two-level schemes for the tree-node predictors resulting in a three-level TMP design. Placing the TMP at completion time reduces the critical latency in the front-end of the pipeline; the resultant longer update latency does not significantly impact the overall performance. In this paper the TMP is applied to a trace cache design and shown to be very effective in increasing its performance.Results: A realistic-size TMP (72KB) can predict 1, 2, 3, and 4 consecutive blocks with compounded prediction accuracies of 96%, 93%, 87%, and 82%, respectively. The block-based trace cache with this TMP achieves 4.75 IPC for SPECint95 on an idealized machine, which is a 20% performance improvement over the original design [1]. This improved performance is 8% above that of a conventional I-cache design with perfect single branch prediction.