Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A comparison of dynamic branch predictors that use two levels of branch history
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Increasing the instruction fetch rate via multiple branch prediction and a branch address cache
ICS '93 Proceedings of the 7th international conference on Supercomputing
Next cache line and set prediction
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Optimization of instruction fetch mechanisms for high issue rates
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic path-based branch correlation
Proceedings of the 28th annual international symposium on Microarchitecture
Control flow prediction with tree-like subgraphs for superscalar processors
Proceedings of the 28th annual international symposium on Microarchitecture
Multiple-block ahead branch predictors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Path-based next trace prediction
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving trace cache effectiveness with branch promotion and trace packing
Proceedings of the 25th annual international symposium on Computer architecture
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Evaluation of Design Options for the Trace Cache Fetch Mechanism
IEEE Transactions on Computers - Special issue on cache memory and related problems
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
The PowerPC 604 RISC microprocessor
IEEE Micro
Proceedings of the 27th annual international symposium on Computer architecture
PipeRench implementation of the instruction path coprocessor
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
On Augmenting Trace Cache for High-Bandwidth Value Prediction
IEEE Transactions on Computers
Performance Evaluation of Exception Handling in I/O Libraries
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Improving branch prediction accuracy with parallel conservative correctors
Proceedings of the 2nd conference on Computing frontiers
Wide and efficient trace prediction using the local trace predictor
Proceedings of the 20th annual international conference on Supercomputing
Using NAND flash memory for executing large volume real-time programs in automotive embedded systems
EMSOFT '10 Proceedings of the tenth ACM international conference on Embedded software
Proceedings of the 9th conference on Computing Frontiers
Hi-index | 0.00 |
The need for multiple branch prediction is inherent to wide instruction fetching. This paper presents a completion time multiple branch predictor called the Tree-based Multiple Branch Predictor (TMP) that builds on previous single branch prediction techniques. It employs a tree structure of branch predictors, or tree-node predictors, and achieves accurate multiple branch prediction by leveraging the high accuracies of the individual branch predictors. A highly-efficient TMP design uses the 2-bit saturating counters for the tree-node predictors. To achieve higher prediction rate, the TMP employs two-level schemes for the tree-node predictors resulting in a three-level TMP design. Placing the TMP at completion time reduces the critical latency in the front-end of the pipeline; the resultant longer update latency does not significantly impact the overall performance. In this paper the TMP is applied to a trace cache design and shown to be very effective in increasing its performance.Results: A realistic-size TMP (72KB) can predict 1, 2, 3, and 4 consecutive blocks with compounded prediction accuracies of 96%, 93%, 87%, and 82%, respectively. The block-based trace cache with this TMP achieves 4.75 IPC for SPECint95 on an idealized machine, which is a 20% performance improvement over the original design [1]. This improved performance is 8% above that of a conventional I-cache design with perfect single branch prediction.