Hardware Support for Control Transfers in Code Caches

  • Authors:
  • Ho-Seop Kim;James E. Smith

  • Affiliations:
  • Department of Electrical and Computer Engineering, University of Wisconsin - Madison;Department of Electrical and Computer Engineering, University of Wisconsin - Madison

  • Venue:
  • Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many dynamic optimization and/or binary translationsystems hold optimized/translated superblocks in a codecache. Conventional code caching systems suffer fromoverheads when control is transferred from one cachedsuperblock to another, especially via register-indirectjumps. The basic problem is that instruction addresses inthe code cache are different from those in the original programbinary. Therefore, performance for register-indirectjumps depends on the ability to translate efficiently fromsource binary PC values to code cache PC values.We analyze several key aspects of superblock chainingand find that a conventional baseline code cache withsoftware jump target prediction results in 14.6% IPC lossversus the original binary. We identify the inability to usea conventional return address stack as the most significantperformance limiter in code cache systems. We introduce amodified software prediction technique that reduces theIPC loss to 11.4%. This technique is based on a techniqueused in threaded code interpreters.A number of hardware mechanisms, including a specializedreturn address stack and a hardware cache fortranslated jump target addresses, are studied for efficientlysupporting register-indirect jumps. Once all the chainingoverheads are removed by these support mechanisms, asuperblock-based code cache improves performance due toa better branch prediction rate, improved I-cache locality,and increased chances of straight-line fetches. Simulationresults show a 7.7% IPC improvement over a current generation4-way superscalar processor.