Recycling waste: exploiting wrong-path execution to improve branch prediction

Authors:
Haitham Akkary;Srikanth T. Srinivasan;Konrad Lai
Affiliations:
Microprocessor Research Lab, Intel Corporation;Microprocessor Research Lab, Intel Corporation;Microprocessor Research Lab, Intel Corporation
Venue:
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Year:
2003

Citing 18
Cited 5

Two-level adaptive training branch prediction

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Improving data cache performance by pre-executing instructions under a cache miss

ICS '97 Proceedings of the 11th international conference on Supercomputing
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
A dynamic multithreading processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Simultaneous subordinate microthreading (SSMT)

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Register integration: a simple and efficient implementation of squash reuse

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Execution-based prediction using speculative slices

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Speculative precomputation: long-range prefetching of delinquent loads

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Data prefetching by dependence graph precomputation

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
The optimum pipeline depth for a microprocessor

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Increasing processor performance by implementing deeper pipelines

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Design tradeoffs for the Alpha EV8 conditional branch predictor

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Difficult-path branch prediction using subordinate microthreads

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dynamic speculative precomputation

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Dynamic Branch Prediction with Perceptrons

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Improving processor performance by dynamically pre-processing the instruction stream

Improving processor performance by dynamically pre-processing the instruction stream

The Impact of Incorrectly Speculated Memory Operations in a Multithreaded Architecture

IEEE Transactions on Parallel and Distributed Systems
Tight analysis of the performance potential of thread speculation using spec CPU 2006

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Reexecution and Selective Reuse in Checkpoint Processors

Transactions on High-Performance Embedded Architectures and Compilers II
The potential of using dynamic information flow analysis in data value prediction

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value Prediction

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Despite continuous improvement in branch prediction algorithms, branch misprediction remains a major limitation on microprocessor performance. As pipelines are widened or stretched deeper, branch prediction will become even more crucial. This paper taps into a currently wasted resource, wrong-path execution, to help improve branch prediction. Due to control independence, often the outcomes of branches that are executed along the wrong-path match the outcomes on the correct-path. Current branch prediction methods rely on correlation between branches on the correct path, therefore leaving potentially useful wrong-path branch information unexploited. We present in this paper a new, very simple, and very effective method that extends branch prediction to allow the recycling of wrong-path branch outcomes at the fetch stage. Simulations of deeply pipelined processors using a selected set of SpecInt 2000 and other benchmarks, with more than 5 branch mispredictions per thousand micro-operations, show that branch misprediction rate can be reduced by up to 30%. Depending on the pipeline depth, the corresponding average performance improvement varies from 5% to 20%.