An integrated partitioning and scheduling based branch decoupling

  • Authors:
  • Pramod Ramarao;Akhilesh Tyagi

  • Affiliations:
  • Department of Electrical & Computer Engineering, Iowa State University, Ames;Department of Electrical & Computer Engineering, Iowa State University, Ames

  • Venue:
  • ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Conditional branch induced control hazards cause significant performance loss in modern out-of-order superscalar processors. Dynamic branch prediction techniques help alleviate the penalties associated with conditional branch instructions. However, branches still constitute one of the main hurdles towards achieving higher ILP. Dynamic branch prediction relies on the temporal locality of and spatial correlations between branches. Branch decoupling is yet another mechanism that exploits the innate lead in the branch schedule with respect to the rest of the computation. The compiler is responsible for generating the two maximally decoupled instruction streams: branch stream and program stream. Our earlier work on trace based evaluation of branch decoupling demonstrates a performance advantage of between 12% to 46% over 2-level branch prediction. However, how much of these gains are achievable through static, compiler driven decoupling is not known. This paper answers the question partially. A novel decoupling algorithm that integrates graph bi-partitioning and scheduling, was deployed in the GNU C compiler to generate a two instruction stream executable. These executables were targeted to branch decoupled architecture simulator with superscalar cores for the branch stream and program stream processors. Simulations show an average performance improvement of 7.7% and 5.5% for integer and floating point benchmarks of the SPEC2000 benchmark suite respectively.