Long-latency branches: how much do they matter?

Authors:
Abhas Kumar;Nisheet Jain;Mainak Chaudhuri
Affiliations:
Indian Institute of Technology, Kanpur, India;Indian Institute of Technology, Kanpur, India;Indian Institute of Technology, Kanpur, India
Venue:
ACM SIGARCH Computer Architecture News
Year:
2006

Citing 18
Cited 0

Trading conflict and capacity aliasing in conditional branch predictors

Proceedings of the 24th annual international symposium on Computer architecture
Improving branch predictors by correlating on data values

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Design tradeoffs for the Alpha EV8 conditional branch predictor

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Neural methods for dynamic branch prediction

ACM Transactions on Computer Systems (TOCS)
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Alpha 21264 Microprocessor

IEEE Micro
Keynote: Is there anything more to learn about high performance processors?

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Control-Flow Speculation through Value Prediction for Superscalar Processors

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Selective Branch Prediction Reversal by Correlating with Data Values and Control Flow

ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Fast Path-Based Neural Branch Prediction

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Prophet/Critic Hybrid Branch Prediction

Proceedings of the 31st annual international symposium on Computer architecture
Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Memory Controller Optimizations for Web Servers

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Checkpointed Early Load Retirement

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Chip Multithreading: Opportunities and Challenges

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
BioBench: A Benchmark Suite of Bioinformatics Applications

ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dynamic branch prediction plays a key role in delivering high performance in the modern microprocessors. The cycles between the prediction of a branch and its execution constitute the branch misprediction penalty because a misprediction can be detected only after the branch executes. Branch misprediction penalty depends not only on the depth of the pipeline, but also on the availability of branch operands. Fetched branches belonging to the dependence chains of loads that miss in the L1 data cache exhibit very high misprediction penalty due to the delay in the execution resulting from unavailability of operands. We call these the long-latency branches. It has been speculated that predicting such branches accurately or identifying such mispredicted branches before they execute would be beneficial. In this paper, we show that in a traditional pipeline the frequency of mispredicted long-latency branches is extremely small. Therefore, predicting all these branches correctly does not offer any performance improvement. Architectures that allow checkpoint-assisted speculative load retirement fetch a large number of branches belonging to the dependence chains of the speculatively retired loads. Accurate prediction of these branches is extremely important for staying on the correct path. We show that even if all the branches belonging to the dependence chains of the loads that miss in the L1 data cache are predicted correctly, only four applications out of twelve control speculation-sensitive applications selected from the SPECInt2000 and BioBench suites exhibit visible performance improvement. This is an upper bound on the achievable performance improvement in these architectures. This article concludes that it may not be worth designing specialized hardware to improve the prediction accuracy of the long-latency branches.