Long-latency branches: how much do they matter?

  • Authors:
  • Abhas Kumar;Nisheet Jain;Mainak Chaudhuri

  • Affiliations:
  • Indian Institute of Technology, Kanpur, India;Indian Institute of Technology, Kanpur, India;Indian Institute of Technology, Kanpur, India

  • Venue:
  • ACM SIGARCH Computer Architecture News
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Dynamic branch prediction plays a key role in delivering high performance in the modern microprocessors. The cycles between the prediction of a branch and its execution constitute the branch misprediction penalty because a misprediction can be detected only after the branch executes. Branch misprediction penalty depends not only on the depth of the pipeline, but also on the availability of branch operands. Fetched branches belonging to the dependence chains of loads that miss in the L1 data cache exhibit very high misprediction penalty due to the delay in the execution resulting from unavailability of operands. We call these the long-latency branches. It has been speculated that predicting such branches accurately or identifying such mispredicted branches before they execute would be beneficial. In this paper, we show that in a traditional pipeline the frequency of mispredicted long-latency branches is extremely small. Therefore, predicting all these branches correctly does not offer any performance improvement. Architectures that allow checkpoint-assisted speculative load retirement fetch a large number of branches belonging to the dependence chains of the speculatively retired loads. Accurate prediction of these branches is extremely important for staying on the correct path. We show that even if all the branches belonging to the dependence chains of the loads that miss in the L1 data cache are predicted correctly, only four applications out of twelve control speculation-sensitive applications selected from the SPECInt2000 and BioBench suites exhibit visible performance improvement. This is an upper bound on the achievable performance improvement in these architectures. This article concludes that it may not be worth designing specialized hardware to improve the prediction accuracy of the long-latency branches.