Profile-assisted Compiler Support for Dynamic Predication in Diverge-Merge Processors

  • Authors:
  • Hyesoon Kim;José A. Joao;Onur Mutlu;Yale N. Patt

  • Affiliations:
  • University of Texas at Austin;University of Texas at Austin;Microsoft Research;University of Texas at Austin

  • Venue:
  • Proceedings of the International Symposium on Code Generation and Optimization
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Dynamic predication has been proposed to reduce the branch misprediction penalty due to hard-to-predict branch instructions. A recently proposed dynamic predication architecture, the diverge-merge processor (DMP), provides large performance improvements by dynamically predicating a large set of complex control-flow graphs that result in branch mispredictions. DMP requires significant support from a profiling compiler to determine which branch instructions and control-flow structures can be dynamically predicated. However, previous work on dynamic predication did not extensively examine the tradeoffs involved in profiling and code generation for dynamic predication architectures. This paper describes compiler support for obtaining high performance in the diverge-merge processor. We describe new profile-driven algorithms and heuristics to select branch instructions that are suitable and profitable for dynamic predication. We also develop a new profile-based analytical cost-benefit model to estimate, at compiletime, the performance benefits of the dynamic predication of different types of control-flow structures including complex hammocks and loops. Our evaluations show that DMP can provide 20.4% average performance improvement over a conventional processor on SPEC integer benchmarks with our optimized compiler algorithms, whereas the average performance improvement of the best-performing alternative simple compiler algorithm is 4.5%. We also find that, with the proposed algorithms, DMP performance is not significantly affected by the differences in profile- and run-time input data sets.